Closed ckunki closed 6 months ago
It appears to be slower than using the exasol_bucketfs_utils_python.bucketfs_location.BucketFSLocation.upload_fileobj_to_bucketfs
.
The code that seems slow in comparison is this:
import exasol.bucketfs as bfs # type: ignore
bucketfs = bfs.Service(buckfs_url, buckfs_credentials)
bucket = bucketfs[bucket_name]
bucket.upload(bfs_file_name, pickle.dumps(object))
@ahsimb I had a look at the code of both implementation. The actual upload is handle identical. However, the new implementation fetches all buckets on the service before returning the bucket. @Nicoretti do we really need to read the buckets before returning a bucket object? Or do we want add a option, that you can disable that.
So, to the question is, is it only significant slower for small files or also for larger files.
@tkilias what do you mean with fetches all buckets (do you mean the listing of which buckets are available)? @tkilias afik @ahsimb said, it is slow for large files (GBs) or multiple (MBs).
@Nicoretti yes, I mean the listing if it is slow for large files, it is mysterious.
@tkilias we should establish a "performance" regression test for the 2-3 types of access, ensuring that any variation falls within a predefined epsilon range. Any suggestion what reasonable epsilon would be here?
I did some timing uploading files of about 1/4 Gb in size to the Docker-DB and couldn't see a difference between the old and the new interface. The new interface has an overhead of getting a list of buckets from the server. This is a separate HTTP(s) request. But for large files this overhead is relatively small.
@ahsimb reports
bucketfs-python
to be multiple times slower thancurl
.Summary
The new
bucketfs-python
API is significantly slower when transferring large files (multiple MBs/GBs) compared to usingcurl
and the previous API version.Reproducing the Issue
Reproducibility: always
Steps to reproduce the behavior:
bucketfs-python
API to upload a large file (several MBs or GBs).curl
and the olderbucketfs-python
API method. Old API:Expected Behaviour
The new
bucketfs-python
API should offer comparable performance to the old API and ideally also to methods likecurl
.Actual Behaviour
The upload process with the new API is significantly slower than using
curl
and the previous API version, affecting efficiency and throughput for large file transfers.