audeering / audb

Manage audio and video databases
https://audeering.github.io/audb/
Other
23 stars 1 forks source link

Use different name to store dependencies of a database? #321

Closed hagenw closed 4 weeks ago

hagenw commented 11 months ago

Currently, we store the dependencies of a database in the file db.csv inside the database folder. As long as we use audb.load() this does not matter as the users are not supposed to directly interact with files stored in the cache.

But when using audb.load_to() users might start to modify files, e.g. renaming media files and updating the database with db.save(). Afterwards it can happen that they wonder why the file names inside db.csv have not changed.

One solution would be to hide the file by storing it as .db.csv or be more explicit about its content, e.g. dependencies.csv or audb-dependencies.csv.

hagenw commented 5 months ago

~~As a side note, pyarrow will become a dependency of pandas anyway: https://github.com/pandas-dev/pandas/blob/main/web/pandas/pdeps/0010-required-pyarrow-dependency.md So, it should be fine if we starting integrating pyarrow based approaches here as well, e.g., storing dependencies as parquet files.~~

Wrong place, moved to https://github.com/audeering/audb/issues/300

hagenw commented 4 weeks ago

As this would require extra backward compatibility handling by using two possible dependency file names for several years (or forever), I don't think it is a great idea to change the name of the dependency table.