HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
126 stars 52 forks source link

There is no ETag in the response header #240

Open AboPlus opened 11 months ago

AboPlus commented 11 months ago

Hello, I found that when I got the value of dataset, there was no ETag in the response header information. May I ask what caused this?

jreadey commented 11 months ago

The Tencent docs say the ETag is supported (see https://main.qcloudimg.com/raw/document/intl/product/pdf/436_14102_en.pdf), so I'd have expected it to work.

If you look through the DN logs, do you see a line like: "s3Client.get_key_stats, expected to find key: ETag"?

AboPlus commented 11 months ago

The Tencent docs say the ETag is supported (see https://main.qcloudimg.com/raw/document/intl/product/pdf/436_14102_en.pdf), so I'd have expected it to work.

If you look through the DN logs, do you see a line like: "s3Client.get_key_stats, expected to find key: ETag"?

I checked the DN log, but I didn't see anything related to ETag.

jreadey commented 11 months ago

Do you see any DN lines like: INFO> head: ...? The "..." should be a JSON including the ETag.

AboPlus commented 11 months ago

Do you see any DN lines like: INFO> head: ...? The "..." should be a JSON including the ETag.

@jreadey No, I don't see any INFO> head: ... lines, which is weird.

jreadey commented 10 months ago

I might have not correctly understood the issue... HSDS gets the ETags for each object as part of the "scanRoot" operation, but this is not returned in the GET /datasets response. If you do hsstat <domain> do you see an MD5 line?

AboPlus commented 9 months ago

I might have not correctly understood the issue... HSDS gets the ETags for each object as part of the "scanRoot" operation, but this is not returned in the GET /datasets response. If you do hsstat <domain> do you see an MD5 line?

@jreadey Yes, I have got MD5 line when I do hsstat <domain>. So doesn't return an Etag in the GET /datasets response, right? But I can see from the example in this document that there is an Etag in the response information: https://github.com/HDFGroup/hdf-rest-api/blob/master/DatasetOps/GET_Value.md

image

jreadey commented 9 months ago

Ok, I see that. Not sure why the Etag key disappeared from the response header. Possibly this was the default in an earlier version of the aiohttp package, but not with the version we are using now.

It shouldn't be hard to restore the etag response, but let me ask about the semantics you are expecting...

With GET /datasets/datasetid/value, do you want the etag to represent the state of all the dataset values, or just the selection specified in the request?

For GET /datasets/datasetid, it seems clear that the etag should represent the dataset values combined with any metadata (e.g. attributes).

Are you thinking to use a HEAD request first to determine if anything has changed, and only do a GET when the values have changed?

AboPlus commented 9 months ago

Ok, I see that. Not sure why the Etag key disappeared from the response header. Possibly this was the default in an earlier version of the aiohttp package, but not with the version we are using now.

It shouldn't be hard to restore the etag response, but let me ask about the semantics you are expecting...

With GET /datasets/datasetid/value, do you want the etag to represent the state of all the dataset values, or just the selection specified in the request?

For GET /datasets/datasetid, it seems clear that the etag should represent the dataset values combined with any metadata (e.g. attributes).

Are you thinking to use a HEAD request first to determine if anything has changed, and only do a GET when the values have changed?

@jreadey Thank you very much for your reply.

Yes, what you said is exactly what I want, I want thinking to use a HEAD request first to determine if anything has changed, and only do a GET when the values have changed.

With GET /datasets/datasetid/value, I want the etag to represent the status of the selection specified in the request.

Thank you again!

jreadey commented 8 months ago

Yes, that sounds useful. I created a feature issue for this here: https://github.com/HDFGroup/hsds/issues/268. In the meantime, you might find it useful to do a GET on the dataset and check the modification time. If that hasn't changed, you know the data will not have been updated. If it has, the data in your selection may or may not have been changed, so you'd need to do a GET value to verify.

AboPlus commented 8 months ago

Yes, that sounds useful. I created a feature issue for this here: #268. In the meantime, you might find it useful to do a GET on the dataset and check the modification time. If that hasn't changed, you know the data will not have been updated. If it has, the data in your selection may or may not have been changed, so you'd need to do a GET value to verify.

That's good idea! I will try to do a GET on the dataset and check the modification time.