I'm running the following code to measure the latency of isdir:
fs = GCSFileSystem(
token="google_default", projectstring="..."
)
datadir = "<path to GCS directory with thousands of files>"
filenames = ["gs://" + x["name"] for x in fs.listdir(datadir)][:500]
times = []
for f in tqdm(filenames):
begin = time.time()
fs.isdir(f)
end = time.time()
times.append(end - begin)
print("Average time: ", sum(times) / len(times))
In version 2023.09.01, the average time per fs.isdir() call is 0.05 seconds. In version 2023.09.00, the average time is 0.0001 seconds. This causes a significant slowdown (from 2 seconds to several minutes) when multiplied by the thousands of files in our GCS directory.
Hello,
I'm running the following code to measure the latency of
isdir
:In version 2023.09.01, the average time per fs.isdir() call is 0.05 seconds. In version 2023.09.00, the average time is 0.0001 seconds. This causes a significant slowdown (from 2 seconds to several minutes) when multiplied by the thousands of files in our GCS directory.
Thank you for your help, Devin