ROS3 driver seems to make an S3 request for already cached file's region

ajelenak commented 3 months ago

Describe the bug

I collected ROS3 logs about S3 (HTTP range GET) requests and noticed that it makes a request for the same file region that should already be in its cache. This behavior only happens when page buffer cache is enabled.

From the log when page buffer is disabled (size = 0):

cohdf5-gedi:DEBUG:start:<Timer "open" at 0x102cfd100 not yet started>
HEAD: Bytes 0 - 18446744073709551615, Request Size: 0
 -- size: 1526726656
GET: Bytes 0 - 16777215, Request Size: 16777216
cohdf5-gedi:DEBUG:stop:<Timer "open" at 0x102cfd100: Elapsed 4.0371 seconds>
cohdf5-gedi:DEBUG:start:<Timer "metadata" at 0x103b4a390 not yet started>
cohdf5-gedi:DEBUG:stop:<Timer "metadata" at 0x103b4a390: Elapsed 0.0008 seconds>

There is only one GET request during file open for the first 16 MiB of the file. This is as expected.

From the log when page buffer is enabled (size = 64 MiB):

cohdf5-gedi:DEBUG:start:<Timer "open" at 0x103bb1d00 not yet started>
HEAD: Bytes 0 - 18446744073709551615, Request Size: 0
 -- size: 1526726656
GET: Bytes 0 - 16777215, Request Size: 16777216
cohdf5-gedi:DEBUG:stop:<Timer "open" at 0x103bb1d00: Elapsed 2.9537 seconds>
cohdf5-gedi:DEBUG:start:<Timer "metadata" at 0x10325fdd0 not yet started>
GET: Bytes 0 - 16777215, Request Size: 16777216
cohdf5-gedi:DEBUG:stop:<Timer "metadata" at 0x10325fdd0: Elapsed 2.6084 seconds>

There is now another GET request for exactly the same region of the file. This is not expected because the driver should have cached that file's region from the first GET request.

Platform (please complete the following information)

HDF5 version: develop branch at commit 1fafdc9da3b3ca17a019cdf237958da29788160a
OS and version: macOS Sonoma 14.6
Compiler and version: clang 15
Build system (e.g. CMake, Autotools): cmake 3.30
Any configure options you specified: build ROS3 driver
MPI library and version (parallel HDF5): n/a

mattjala commented 2 months ago

Could you provide more information about the situation that caused this? I haven't been able to replicate this issue with the 16MiB paged file PAGE16MiB_GEDI01_B_2023034194553_O23479_02_T00431_02_005_02_V002.h5 from the hdf5.sample bucket. After opening the file, subsequent read requests within the first 16MiB don't become s3 requests. I tried decreasing the ROS3 cache size in case it requires the PB cache to be larger than the ROS3 cache, but it still behaves as expected.

ajelenak commented 2 months ago

What is labelled as "metadata" task in the logs is given below in h5py Python:

list(h5file.keys())
list(h5file["METADATA"])
for _ in h5file["METADATA"]["DatasetIdentification"].attrs:
    continue
beamNames = [g for g in h5file.keys() if g.startswith("BEAM")]
for _ in h5file["BEAM0000"].attrs:
    continue

It happens right after opening the file you mentioned.

HDFGroup / hdf5

ROS3 driver seems to make an S3 request for already cached file's region #4700