HDFGroup / h5pyd

h5py distributed - Python client library for HDF Rest API
Other
110 stars 39 forks source link

An error related to hsrm #150

Closed AboPlus closed 11 months ago

AboPlus commented 12 months ago

Hello,

When I tried to delete one of my domains using hsrm, I encountered the following problem:

(hsds3.9) [root@VM-3-133-tencentos ~/h5file]# hsrm /home/test_user1/10003.h5
Traceback (most recent call last):
  File "/root/miniconda3/envs/hsds3.9/bin/hsrm", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/hsds3.9/lib/python3.9/site-packages/h5pyd/_apps/hsdel.py", line 158, in main
    deleteDomain(domain)
  File "/root/miniconda3/envs/hsds3.9/lib/python3.9/site-packages/h5pyd/_apps/hsdel.py", line 67, in deleteDomain
    if base_name not in hparent:
  File "/root/miniconda3/envs/hsds3.9/lib/python3.9/site-packages/h5pyd/_hl/folders.py", line 413, in __contains__
    self._getSubdomains()
  File "/root/miniconda3/envs/hsds3.9/lib/python3.9/site-packages/h5pyd/_hl/folders.py", line 317, in _getSubdomains
    raise IOError(rsp.status_code, rsp.reason)
OSError: [Errno 404] Not Found
(hsds3.9) [root@VM-3-133-tencentos ~/h5file]# hsls /home/test_user1/10003.h5
/ Group
47b895a035a34042891f22d18a047bb3 Group
48e80e91f71c44fcb857ba11ca5463a3 Group
bb81d4187bb745c79bc20bf02a7eee19 Group
c7885b08517e45c1a815979c08217e68 Group
a398a7af2bde4985a807d8e6a7a86fd3 Group
ee23fbfc5bbb4b9382d8b38f3f56274b Group
55be73d27649461186221f81f2bb0cbb Group
54d2153cd0a948ebaff44b10d8596aa2 Group

I wonder why there are Not Found errors?

AboPlus commented 12 months ago

As an added bonus, here is the docker log for hsrm execution:

REQ> GET: / [/home/test_user1]
DEBUG> num tasks: 4 active tasks: 4
DEBUG> validateUserPassword username: admin
DEBUG> looking up username: admin
DEBUG> user password validated
DEBUG> GET_Domain domain: hsds-test-1310752370/home/test_user1 bucket: hsds-test-1310752370
INFO> got domain: hsds-test-1310752370/home/test_user1
INFO> getDomainJson(hsds-test-1310752370/home/test_user1, reload=True)
DEBUG> LRU DomainCache node hsds-test-1310752370/home/test_user1 removed DomainCache
DEBUG> getNodeCount for dn_urls: ['http://172.28.0.5:6101', 'http://172.28.0.6:6101', 'http://172.28.0.7:6101', 'http://172.28.0.8:6101']
DEBUG> got dn_url: http://172.28.0.5:6101 for obj_id: hsds-test-1310752370/home/test_user1
DEBUG> sending dn req: http://172.28.0.5:6101/domains params: {'domain': 'hsds-test-1310752370/home/test_user1'}
INFO> http_get('http://172.28.0.5:6101/domains')
DEBUG> get_http_client, url: http://172.28.0.5:6101/domains
INFO> http_get status: 200 for req: http://172.28.0.5:6101/domains
DEBUG> setitem, key: hsds-test-1310752370/home/test_user1
DEBUG> LRU DomainCache adding 1024 to cache, mem_size is now: 207872
DEBUG> LRU DomainCache added new node: hsds-test-1310752370/home/test_user1 [1024 bytes]
DEBUG> got domain_json: {'owner': 'test_user1', 'acls': {'test_user1': {'create': False, 'read': True, 'update': False, 'delete': False, 'readACL': False, 'updateACL': False, 'domain': 'hsds-test-1310752370/home/test_user1'}, 'default': {'create': False, 'read': True, 'update': False, 'delete': False, 'readACL': False, 'updateACL': False}, 'admin': {'create': True, 'read': True, 'update': True, 'delete': True, 'readACL': True, 'updateACL': True, 'domain': 'hsds-test-1310752370/home/test_user1'}}, 'created': 1687769559.9881833, 'lastModified': 1688980331.5911171}
INFO> aclCheck: read for user: admin
DEBUG> href parent domain: hsds-test-1310752370/home
 RSP> <200> (OK): /
REQ> GET: /domains [/home/test_user1/]
DEBUG> num tasks: 4 active tasks: 4
DEBUG> validateUserPassword username: admin
DEBUG> looking up username: admin
DEBUG> user password validated
DEBUG> getNodeCount for dn_urls: ['http://172.28.0.5:6101', 'http://172.28.0.6:6101', 'http://172.28.0.7:6101', 'http://172.28.0.8:6101']
INFO> get_domains for: /home/test_user1/ verbose: False
DEBUG> get_domains - using Limit: 1000
INFO> get_domains - prefix: /home/test_user1/ bucket: hsds-test-1310752370
DEBUG> get_domains - listing S3 keys for home/test_user1/
INFO> getStorKeys('home/test_user1/','/','', include_stats=False
INFO> list_keys('home/test_user1/','/','', include_stats=False, callback not set
WARN> bucket: hsds-test-1310752370 does not exist, exception: An error occurred (NoSuchKey) when calling the ListObjects operation: The specified key does not exist.
jreadey commented 11 months ago

What does hsls /home/ give?

AboPlus commented 11 months ago

@jreadey When I finish executing hsls /home/ the response is as follows:

(hsds3.9) [root@VM-3-133-tencentos ~/h5file]# hsls /home/
admin                                               folder   2023-07-11 17:15:43 /home/
1 items
jreadey commented 11 months ago

It looks like there is no /home/test_user1/ folder. That's likely why the hsrm is reporting "not found'. HSDS is expecting to see /home/test_user1/ parent and not finding it.

Try running: hstouch -u admin -p admin -o test_user1 /home/test_user1/. Change the second admin to the actual admin password.

AboPlus commented 11 months ago

It looks like there is no /home/test_user1/ folder. That's likely why the hsrm is reporting "not found'. HSDS is expecting to see /home/test_user1/ parent and not finding it.

Try running: hstouch -u admin -p admin -o test_user1 /home/test_user1/. Change the second admin to the actual admin password.

@jreadey In fact, there is a /home/test_user1/ folder, but it is not available from the hsls /home/ directive. I will demonstrate this with the following instructions:

(hsds3.9) [root@VM-3-133-tencentos ~/h5file]# hstouch -u admin -p admin -o test_user1 /home/test_user1/
Can not update timestamp of folder object
(hsds3.9) [root@VM-3-133-tencentos ~/h5file]# hstouch -u admin -p admin -o admin /home/admin_folder/
(hsds3.9) [root@VM-3-133-tencentos ~/h5file]# hsls /home/
admin                                               folder   2023-07-11 17:15:43 /home/
1 items

When I finished executing hsls /home/, I found that the /home/admin_folder/ I had just created did not exist either. But when I execute hsls /home/admin_folder/, I get something like this:

(hsds3.9) [root@VM-3-133-tencentos ~/h5file]# hsls /home/admin_folder/
admin                                               folder   2023-07-17 09:43:33 /home/admin_folder/
1 items
jreadey commented 11 months ago

Do you see any relevant warnings/errors in the DN logs?

AboPlus commented 11 months ago

Do you see any relevant warnings/errors in the DN logs?

@jreadey There is no warning/error message in the DN log, but there is a warning message in the SN log when I execute hsrm:

REQ> GET: / [/home/test_user1]
REQ> GET: /domains [/home/test_user1/]
WARN> bucket: resource does not exist, exception: An error occurred (NoSuchKey) when calling the ListObjects operation: The specified key does not exist.

And the message of DN logs:

REQ> GET: /domains [resource/home/test_user1]
jreadey commented 11 months ago

Very odd! It looks like HSDS is expecting to see a bucket named "resource" and not finding it. Did you set the BUCKET_NAME environment variable to the bucket you created (as in the sample .bashrc in the setup guide)?

AboPlus commented 11 months ago

Very odd! It looks like HSDS is expecting to see a bucket named "resource" and not finding it. Did you set the BUCKET_NAME environment variable to the bucket you created (as in the sample .bashrc in the setup guide)?

@jreadey Yes, my BUCKET_NAME and AWS_S3_GATEWAY environment variables are as follows:

export BUCKET_NAME=resource
export AWS_S3_GATEWAY=https://hsds-test-1310752370.cos.ap-beijing.myqcloud.com

I want to confirm whether the resource here should be a separate bucket or a folder? I now have only one bucket, and the bucket name is hsds-test-1310752370, just like in the AWS_S3_GATEWAY environment variable. At present, I find that the resource here is the root folder of my bucket hsds-test-1310752370. I also configured the BUCKET_NAME environment variable hsds-test-1310752370 before, but found that it is also the root folder of the bucket.

The structure in my hsds-test-1310752370 bucket looks something like this:

.
└── resource/
    ├── db/
    └── ...
    ├── home
        ├── admin/
        ├── test_user1/
        ├── test_user2/
    └── .domain.json
jreadey commented 11 months ago

Thanks for the info.

I think using the AWS_S3_GATEWAY endpoint to point to a specific bucket could be the problem. Look at the endpoint used in the Tencent documentation:
https://www.tencentcloud.com/document/product/436/32537#python. It just references the bucket region, not the bucket within the region.

Could you try setting up your AWS_S3_GATEWAY like that and then BUCKET_NAME to "hsds-test-1310752370"?

You'll need to re-create the /home/ folders so that they show up in the top-level of the bucket.

AboPlus commented 11 months ago

Thanks for the info.

I think using the AWS_S3_GATEWAY endpoint to point to a specific bucket could be the problem. Look at the endpoint used in the Tencent documentation: https://www.tencentcloud.com/document/product/436/32537#python. It just references the bucket region, not the bucket within the region.

Could you try setting up your AWS_S3_GATEWAY like that and then BUCKET_NAME to "hsds-test-1310752370"?

You'll need to re-create the /home/ folders so that they show up in the top-level of the bucket.

@jreadey Thank you very much!

I successfully solved this problem according to your guidance. It is true that there is a problem with the configuration of my environment variables AWS_S3_GATEWAY and BUCKET_NAME.

Thanks again!

jreadey commented 11 months ago

Awesome1 Glad to see you are up and running on Tencent cloud.