audeering / audb

Manage audio and video databases
https://audeering.github.io/audb/
Other
23 stars 1 forks source link

Test backward compatibility #413

Closed hagenw closed 1 month ago

hagenw commented 2 months ago

...

hagenw commented 2 months ago

@ChristianGeng this is funny. I try here to test for all the errors we have seen when loading older dependency tables from cache (ModuleNotFoundError under Windows, KeyError under Python 3.8). I created PKL files from the emodb dependency table with different Python and different audb versions and stored them under tests/assests/dependency-table/. But funnily, I'm not able to raise any of the observed errors.

ChristianGeng commented 2 months ago

@ChristianGeng this is funny. I try here to test for all the errors we have seen when loading older dependency tables from cache (ModuleNotFoundError under Windows, KeyError under Python 3.8). I created PKL files from the emodb dependency table with different Python and different audb versions and stored them under tests/assests/dependency-table/. But funnily, I'm not able to raise any of the observed errors.

I tried to investigate the error that you are getting on windows via google.

I find this thread interesting (despite both come trhough duckdb):

https://github.com/duckdb/duckdb/issues/8735 https://github.com/chroma-core/chroma/issues/1069

It basically says that

None of these however relate this to the windows platform. Still, I could imagine that this is simply some weird incompatibility thing that in the windows case that we have strikes us. I also cannot find any code locations where iaudb would import dtype directly. Also when type hinting I cannot find suspicious code locations.

So I am also very startled what is going on here.

hagenw commented 2 months ago

Regarding the code location, I'm very confident that this is the affected code:

https://github.com/audeering/audb/blob/44df511ff8709b505e4cc3cad44ac74349bb8567/audb/core/api.py#L271-L279

So if we simply would catch Exception we would avoid the error. Maybe we can change the code to not load the dependency table if a user Interrupts:

        try:
            deps = Dependencies()
            deps.load(cached_deps_file)
        except KeyboardInterrupt:
            raise
        except Exception:
            # If loading cached file fails, load again from backend.
            # As loading of not compatible pickle files
            # results in a variety of possible errors,
            # we except all besides keyboard interruption
            backend_interface = utils.lookup_backend(name, version)
            deps = download_dependencies(backend_interface, name, version, verbose)
            # Store as pickle in cache
            deps.save(cached_deps_file)
ChristianGeng commented 2 months ago

So if we simply would catch Exception we would avoid the error. Maybe we can change the code to not load the dependency table if a user Interrupts:

Probably then it will be hard to avoid a "catch-all". It is always a bit worrying ...

hagenw commented 2 months ago

At least if there is really another depedency file related error, it should be raised by the other code lines as well.

I think, I'm more worried that audb fails loading something, and the user has no clue what to do.

ChristianGeng commented 2 months ago

At least if there is really another depedency file related error, it should be raised by the other code lines as well.

I think, I'm more worried that audb fails loading something, and the user has no clue what to do.

Not sure whether I get it. So you are talking about the situtation when the exception is raised? But then the other code lines themselves would raise as you mention, saying tht this should be raised by the other code lines as well.

Or is your worry about many problems after a release?

hagenw commented 2 months ago

But then the other code lines themselves would raise as you mention

No, they would not if the exception is raised due to a broken/non-compatible cache file, as the cache file is overwritten. The error would only be raised then, if it is related to the content of the dependency table, but not due to a broken/non-compatible cache.

hagenw commented 2 months ago

As stated in https://github.com/audeering/audinterface/pull/172#issuecomment-2117517305, the actual problem regarding the failed Windows test might not be a backward compatibility issue, but an across platform compatibility issue when using the newest version of audb on all platforms. We should try to extend the test here with a cross-platform test using the newest version of audb to create the cache file.

audeerington commented 1 month ago

I ran into both the ModuleNotFoundError and KeyError when trying to load a database that already had an existing version stored, and then found this PR.

I think it isn't a cross-platform issue, but a compatibility issue with different pandas versions. I tried creating different versions of the db.pkl from the db.csv similar to what was added in https://github.com/audeering/audb/pull/413/commits/d5402d6bf814702d7d55a2fc3f416da18f7e007f , using always audb==1.7.2 but alternating the pandas version. Then I loaded the db.pkl with pd.read_pickle().

This replicated the error:

Maybe the problem just manifested for Windows because it happened to install a different pandas version between tests, but for the other platforms the same pandas version was used.

hagenw commented 1 month ago

Thanks for reporting this.

Would be nice if it is not a cross-platform issue. As then we can solve it by reloading the database and storing again when we experience an error.

I will try to update the tests here to replicate your findings.

hagenw commented 1 month ago

Closing in favor of #417 and #418