audeering / audb

Manage audio and video databases
https://audeering.github.io/audb/
Other
23 stars 1 forks source link

Store dependency table as parquet on backend #398

Closed hagenw closed 2 months ago

hagenw commented 2 months ago

Closes #397

In #372 we switched the format of the dependency table from CSV to PARQUET, which uses already the fast SNAPPY compression algorithm to reduce it's size slightly. We still did compress the file further before uploading it as ZIP to the server. The advantage was that the file was smaller, and that we can download the same ZIP file from the server, independent if the dependency table is stored as PARQUET or CSV.

In #397 we show that file reading and writing is much faster when not zipping the PARQUET file for storage on the backend. Hence, this pull request removes zipping and puts the PARQUET file directly on the server. To have a single source of truth implementation it introduces the download_dependencies() and upload_dependencies() functions (not part of the public API), that are then internally used inside audb.dependencies(), audb.publish(), and audb.remove_media().