audeering / audb

Manage audio and video databases
https://audeering.github.io/audb/
Other
23 stars 1 forks source link

Comparing CSV and PARQUET dependency tables might fail #436

Closed hagenw closed 3 weeks ago

hagenw commented 3 weeks ago

We use pandas.DataFrame.equals() to compare if two dataframes are identical, but this fails if we use different string dtypes, see also https://github.com/pandas-dev/pandas/issues/52791:

>>> df1 = pd.DataFrame(["a", "b"], dtype="string")
>>> df2 = pd.DataFrame(["a", "b"], dtype="string[pyarrow]")
>>> df1.equals(df2)
False

This means when trying to update a database, that has a previous version with a dependency table stored as CSV file, a user will get the following error:

RuntimeError: You want to depend on '1.0.0' of <name>, but the dependency file 'db.parquet' in ../build does not match the dependency file for the requested version in the repository. Did you forgot to call 'audb.load_to(../build, <name>, version='1.0.0') or modified the file manually?