HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
125 stars 52 forks source link

Fix timestamp inaccuracy during domain scan #349

Closed mattjala closed 2 months ago

mattjala commented 2 months ago

getNow() introduces inter-node inaccuracies which lead to issue #348 on POSIX systems. If it's available with precision, time.time() is better, so we use that instead.

Resolves #348.

jreadey commented 2 months ago

Is it still possible to get stuck in a loop for system without a precise time.time()?

mattjala commented 2 months ago

Is it still possible to get stuck in a loop for system without a precise time.time()?

I don't think so - it hasn't cropped up on a Windows test runner yet.

mattjala commented 2 months ago

This would be a slightly separate issue, but depending on inter-node timestamp accuracy for scans is a fragility issue in general. Would it be wise to modify the DN to respond to scan requests with 202 (Accepted) and 102 (In Progress) until the scan is complete? This would require users doing scans to re-poll a provided endpoint, so it'd be a breaking API change, but it would make scans more robust and might eliminate similar issues in the future.

jreadey commented 2 months ago

If it's not causing a problem in the test runners, I'd leave the scan request stuff as is for now. I'm planning to create a general workflow for async tasks (see https://github.com/HDFGroup/hsds/blob/master/docs/design/async_tasks/async_tasks.md). Once that's in place we can move the scan logic there and that should be a better solution.