HDFGroup / hdf5

Official HDF5® Library Repository
https://www.hdfgroup.org/
Other
623 stars 254 forks source link

ROS3 driver seems to make an S3 request for already cached file's region #4700

Open ajelenak opened 3 months ago

ajelenak commented 3 months ago

Describe the bug

I collected ROS3 logs about S3 (HTTP range GET) requests and noticed that it makes a request for the same file region that should already be in its cache. This behavior only happens when page buffer cache is enabled.

From the log when page buffer is disabled (size = 0):

cohdf5-gedi:DEBUG:start:<Timer "open" at 0x102cfd100 not yet started>
HEAD: Bytes 0 - 18446744073709551615, Request Size: 0
 -- size: 1526726656
GET: Bytes 0 - 16777215, Request Size: 16777216
cohdf5-gedi:DEBUG:stop:<Timer "open" at 0x102cfd100: Elapsed 4.0371 seconds>
cohdf5-gedi:DEBUG:start:<Timer "metadata" at 0x103b4a390 not yet started>
cohdf5-gedi:DEBUG:stop:<Timer "metadata" at 0x103b4a390: Elapsed 0.0008 seconds>

There is only one GET request during file open for the first 16 MiB of the file. This is as expected.

From the log when page buffer is enabled (size = 64 MiB):

cohdf5-gedi:DEBUG:start:<Timer "open" at 0x103bb1d00 not yet started>
HEAD: Bytes 0 - 18446744073709551615, Request Size: 0
 -- size: 1526726656
GET: Bytes 0 - 16777215, Request Size: 16777216
cohdf5-gedi:DEBUG:stop:<Timer "open" at 0x103bb1d00: Elapsed 2.9537 seconds>
cohdf5-gedi:DEBUG:start:<Timer "metadata" at 0x10325fdd0 not yet started>
GET: Bytes 0 - 16777215, Request Size: 16777216
cohdf5-gedi:DEBUG:stop:<Timer "metadata" at 0x10325fdd0: Elapsed 2.6084 seconds>

There is now another GET request for exactly the same region of the file. This is not expected because the driver should have cached that file's region from the first GET request.

Platform (please complete the following information)

mattjala commented 2 months ago

Could you provide more information about the situation that caused this? I haven't been able to replicate this issue with the 16MiB paged file PAGE16MiB_GEDI01_B_2023034194553_O23479_02_T00431_02_005_02_V002.h5 from the hdf5.sample bucket. After opening the file, subsequent read requests within the first 16MiB don't become s3 requests. I tried decreasing the ROS3 cache size in case it requires the PB cache to be larger than the ROS3 cache, but it still behaves as expected.

ajelenak commented 2 months ago

What is labelled as "metadata" task in the logs is given below in h5py Python:

list(h5file.keys())
list(h5file["METADATA"])
for _ in h5file["METADATA"]["DatasetIdentification"].attrs:
    continue
beamNames = [g for g in h5file.keys() if g.startswith("BEAM")]
for _ in h5file["BEAM0000"].attrs:
    continue

It happens right after opening the file you mentioned.