HDFGroup / h5pyd

h5py distributed - Python client library for HDF Rest API
Other
110 stars 39 forks source link

Issue in creating /home directory #122

Closed nagarajmmu closed 1 year ago

nagarajmmu commented 1 year ago

Hi

I have installed HSDS in Azure cloud VM with ubuntu OS, installation is successful, but when I am doing, Post Install Configuration, I got an issue at creating "/home" folder, please find the error below. I have used "hstouch -u admin -p admin /home/" command to create "/home" folder. ERROR: Unexpected error: HTTPConnectionPool(host='', port=5101): Max retries exceeded with url: /?getdnids=1&getobjs=T&include_attrs=T&domain=%2Fhome (Caused by ResponseError('too many 500 error responses',))

Please help me, if any configuration is missing. I am planning to create HDF5 file in Azure blob storage. Do I need to have "/home" folder for creating HDF5 file in Azure blob storage.

Please help me.

Thanks you

jreadey commented 1 year ago

Hi - I suspect it's some configuration issue. Does hsinfo run ok? In general, if you are seeing 500 errors, you can do a "docker logs hsds_sn_1" and scan the output for lines with ERROR. If the cause of the ERROR line is not obvious, please update this issue with the text.

BTW, you might be interested in the Azure Marketplace offer for HSDS: https://azuremarketplace.microsoft.com/en-us/marketplace/apps/thehdfgroup1616725197741.hsdsazurevm. This makes setup a little easier as it includes a VM image with the necessary packages and walks you through configuration steps.

nagarajmmu commented 1 year ago

Hi Jreadey

Thanks for the update. Yes "hsinfo" is working fine, below is the hsinfo response: server name: Highly Scalable Data Service (HSDS) server state: READY endpoint: http://:5101 username: admin (admin) password: *** home: NO ACCESS server version: 0.7.0beta node count: 4 up: 38 sec h5pyd version**: 0.10.1

Also I have verified the Docker logs, attached below. ERROR: {"log":"ERROR\u003e request to http://:6101/domains failed with code: 500\n","stream":"stdout","time":"2022-07-29T17:45:33.172949201Z"}

Above error is occurred 5 time.

Please let me know, if any configuration file you need for analysis.

Thanks Nagaraja M M

nagarajmmu commented 1 year ago

Hi Jreadey

I have installed VM from Azure market place and setup hsds service, here also same issue, not able to create "/home" folder. Response from "hsinfo" atatched below: server name: Highly Scalable Data Service (HSDS) server state: READY endpoint: http://:5101 username: admin (admin) password: ***** Error: HTTPConnectionPool(host='', port=5101): Max retries exceeded with url: /?domain=%2Fhome (Caused by ResponseError('too many 500 error responses',)

Please let me know, if any thing I am missing in setup, I am trying to create HDF5 file in azure blob.

Thanks Nagaraja M M

jreadey commented 1 year ago

So the request got to the SN container, it authenticated correctly and send a request to one of the DN containers, but that failed with a 500. Likely something to do with Azure Blob configuration.

Check for errors with the DN containers and let me know what you find.

nagarajmmu commented 1 year ago

Hi John

Thanks for the analysis, yes, issue is authentication error. but not able to read error message in Azure log insights, but "AuthorizationError" count was increasing.

I have given details in environment variables in .bashrc file as below export BUCKET_NAME= export HSDS_ENDPOINT="http://:5101" export AZURE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=;AccountKey=;EndpointSuffix=core.windows.net"

but not able to proceed, is above environment variables are fine? Please let me know, if any think I need to do to fix this issue.

Thanks Nagaraja M M

jreadey commented 1 year ago

Are you seeing DN log authorization errors when trying to read/write to the blob storage? You'll see errors in the logs with "azureBlobClient" that may provide more insight. If you can copy and paste the relevant log error here that would be helpful.

nagarajmmu commented 1 year ago

Hi John

I am trying to read error logs in Azure blob, due to some issue, not able to read exact error log text in Azure. But I have report generated by blob. image

By above image I am able to understand that, "ListBlobs" and "GetBlobProperties" blob API's are failed,

docker logs error attached below. {"log":"REQ\u003e GET: / [/home]\n","stream":"stdout","time":"2022-08-01T18:23:55.651829661Z"} {"log":"INFO\u003e got domain: hsdsblob/home\n","stream":"stdout","time":"2022-08-01T18:23:55.651863261Z"} {"log":"INFO\u003e getDomainJson(hsdsblob/home, reload=True)\n","stream":"stdout","time":"2022-08-01T18:23:55.651868861Z"} {"log":"INFO\u003e http_get('http://:6101/domains')\n","stream":"stdout","time":"2022-08-01T18:23:55.651873362Z"} {"log":"INFO\u003e http_get status: 500 for req: http://:6101/domains\n","stream":"stdout","time":"2022-08-01T18:23:55.654286015Z"} {"log":"ERROR\u003e request to http://:6101/domains failed with code: 500\n","stream":"stdout","time":"2022-08-01T18:23:55.654298415Z"} {"log":"REQ\u003e GET: / [/home]\n","stream":"stdout","time":"2022-08-01T18:23:55.655834149Z"} {"log":"INFO\u003e got domain: hsdsblob/home\n","stream":"stdout","time":"2022-08-01T18:23:55.655987053Z"} {"log":"INFO\u003e getDomainJson(hsdsblob/home, reload=True)\n","stream":"stdout","time":"2022-08-01T18:23:55.655998153Z"} {"log":"INFO\u003e http_get('http://:6101/domains')\n","stream":"stdout","time":"2022-08-01T18:23:55.656116856Z"} {"log":"INFO\u003e http_get status: 500 for req: http://:6101/domains\n","stream":"stdout","time":"2022-08-01T18:23:55.658267003Z"} {"log":"ERROR\u003e request to http://:6101/domains failed with code: 500\n","stream":"stdout","time":"2022-08-01T18:23:55.658279604Z"} {"log":"REQ\u003e GET: / [/home]\n","stream":"stdout","time":"2022-08-01T18:23:57.662263968Z"} {"log":"INFO\u003e got domain: hsdsblob/home\n","stream":"stdout","time":"2022-08-01T18:23:57.66236437Z"} {"log":"INFO\u003e getDomainJson(hsdsblob/home, reload=True)\n","stream":"stdout","time":"2022-08-01T18:23:57.66237387Z"} {"log":"INFO\u003e http_get('http://:6101/domains')\n","stream":"stdout","time":"2022-08-01T18:23:57.662421471Z"} {"log":"INFO\u003e http_get status: 500 for req: http://:6101/domains\n","stream":"stdout","time":"2022-08-01T18:23:57.66552394Z"} {"log":"ERROR\u003e request to http://:6101/domains failed with code: 500\n","stream":"stdout","time":"2022-08-01T18:23:57.665553241Z"} {"log":"REQ\u003e GET: / [/home]\n","stream":"stdout","time":"2022-08-01T18:24:01.671964628Z"} {"log":"INFO\u003e got domain: hsdsblob/home\n","stream":"stdout","time":"2022-08-01T18:24:01.672156533Z"} {"log":"INFO\u003e getDomainJson(hsdsblob/home, reload=True)\n","stream":"stdout","time":"2022-08-01T18:24:01.672219134Z"} {"log":"INFO\u003e http_get('http://:6101/domains')\n","stream":"stdout","time":"2022-08-01T18:24:01.672252635Z"} {"log":"INFO\u003e http_get status: 500 for req: http://:6101/domains\n","stream":"stdout","time":"2022-08-01T18:24:01.675458806Z"} {"log":"ERROR\u003e request to http://:6101/domains failed with code: 500\n","stream":"stdout","time":"2022-08-01T18:24:01.675502007Z"} {"log":"INFO\u003e healthCheck - node_state: READY\n","stream":"stdout","time":"2022-08-01T18:24:04.119913135Z"} {"log":"INFO\u003e register: http://head:5100/register\n","stream":"stdout","time":"2022-08-01T18:24:04.119945836Z"} {"log":"INFO\u003e register req: http://head:5100/register body: {'id': 'sn-cb6c319d9372-dabe8', 'port': 5101, 'node_type': 'sn'}\n","stream":"stdout","time":"2022-08-01T18:24:04.119951336Z"}

Please let me know, if you are able to figure out the issue by above logs and image.

Thanks Nagaraja M M

jreadey commented 1 year ago

This looks strange - not sure why you are seeing the logs as json. Did you do: docker logs hsds_sn_1?
Anyway, looks like it is the dn_1 container (i.e. the one running on port 6101) that is having the problems. So try: docker logs hsds_dn_1. If you see an error there it's likely it will have some information that will tell us what the problem is.

nagarajmmu commented 1 year ago

Hi John

I was listing only docker json file logs, now I got the logs from each docker (docker logs hsds_dn_3), I found the error in hsds_dn_3 docker, attached error below

-------- ERROR------ REQ> GET: /domains [hsdsblob/home] INFO> get_metadata_obj: hsdsblob/home bucket: None ERROR> Unable to import AzureBlobClient INFO> getStorJSONObj(hsdsblob)/home/.domain.json Error handling request Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py", line 435, in _handle_request resp = await request_handler(request) File "/usr/local/lib/python3.9/site-packages/aiohttp/web_app.py", line 504, in _handle resp = await handler(request) File "/usr/local/lib/python3.9/site-packages/hsds/domain_dn.py", line 69, in GET_Domain domain_json = await get_metadata_obj(app, domain) File "/usr/local/lib/python3.9/site-packages/hsds/datanode_lib.py", line 384, in get_metadata_obj obj_json = await getStorJSONObj(app, s3_key, bucket=bucket) File "/usr/local/lib/python3.9/site-packages/hsds/util/storUtil.py", line 236, in getStorJSONObj data = await client.get_object(key, bucket=bucket) AttributeError: 'NoneType' object has no attribute 'get_object' REQ> GET: /domains [hsdsblob/home] INFO> get_metadata_obj: hsdsblob/home bucket: None INFO> getStorJSONObj(hsdsblob)/home/.domain.json Error handling request Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py", line 435, in _handle_request resp = await request_handler(request) File "/usr/local/lib/python3.9/site-packages/aiohttp/web_app.py", line 504, in _handle resp = await handler(request) File "/usr/local/lib/python3.9/site-packages/hsds/domain_dn.py", line 69, in GET_Domain domain_json = await get_metadata_obj(app, domain) File "/usr/local/lib/python3.9/site-packages/hsds/datanode_lib.py", line 384, in get_metadata_obj obj_json = await getStorJSONObj(app, s3_key, bucket=bucket) File "/usr/local/lib/python3.9/site-packages/hsds/util/storUtil.py", line 236, in getStorJSONObj data = await client.get_object(key, bucket=bucket) AttributeError: 'NoneType' object has no attribute 'get_object' INFO> s3sync nothing to update INFO> s3syncCheck no objects to write, sleeping for 1.00

Thanks Nagaraja M M

jreadey commented 1 year ago

Ah, the problem is here: "Unable to import AzureBlobClient".

Looks like there was a change recently where the "azure-storage-blob" package was made optional in the setup script. Fine for AWS, but no so good for Azure!

Let me confirm this and put out a new build for you to try. Should have it sometime tomorrow.

nagarajmmu commented 1 year ago

Hi John

Thank you for the update Let me know, once new build is available, I will apply new build.

Thanks Nagaraja M M

jreadey commented 1 year ago

Hey,

I have a fix checked into master now. You can either build the image locally or use this DockerHub tag: hdfgroup/hsds:sha-164ea71.

I tested it out on Azure and it looked like it worked ok - let me know if it works for you as well.

nagarajmmu commented 1 year ago

Hi John

Thanks for the new build, it worked like a charm, now I am able to create hdf5 file in Azure blob.

Thanks a lot.

jreadey commented 1 year ago

Awesome - I'll close this issue now.