Open ajelenak opened 3 months ago
Could you provide more information about the situation that caused this? I haven't been able to replicate this issue with the 16MiB paged file PAGE16MiB_GEDI01_B_2023034194553_O23479_02_T00431_02_005_02_V002.h5
from the hdf5.sample
bucket. After opening the file, subsequent read requests within the first 16MiB don't become s3 requests. I tried decreasing the ROS3 cache size in case it requires the PB cache to be larger than the ROS3 cache, but it still behaves as expected.
What is labelled as "metadata" task in the logs is given below in h5py Python:
list(h5file.keys())
list(h5file["METADATA"])
for _ in h5file["METADATA"]["DatasetIdentification"].attrs:
continue
beamNames = [g for g in h5file.keys() if g.startswith("BEAM")]
for _ in h5file["BEAM0000"].attrs:
continue
It happens right after opening the file you mentioned.
Describe the bug
I collected ROS3 logs about S3 (HTTP range GET) requests and noticed that it makes a request for the same file region that should already be in its cache. This behavior only happens when page buffer cache is enabled.
From the log when page buffer is disabled (size = 0):
There is only one GET request during file open for the first 16 MiB of the file. This is as expected.
From the log when page buffer is enabled (size = 64 MiB):
There is now another GET request for exactly the same region of the file. This is not expected because the driver should have cached that file's region from the first GET request.
Platform (please complete the following information)