audeering / audbcards

Data cards for audio datasets
https://audeering.github.io/audbcards/
Other
0 stars 0 forks source link

Speedup caching of audbcards.Dataset #83

Closed hagenw closed 5 months ago

hagenw commented 5 months ago

When caching audbcards.Dataset we store objects that are not needed to create a datacard, e.g. the dependency table and header of a dataset. This increases the size of the cache and makes loading slower than it is needed. This pull request speeds up caching of audbcards.Dataset by pickling only cached properties, as listed by audbcards.Dataset._cached_properties() (formerly audbcards.Dataset.properties()).

The execution time for building our database overview page is as follows on compute5:

branch fresh build build from cache
main 15 minutes 3 minutes
this branch 15 minutes 2 minutes

The size of the cache is reduced from 2.6G to 133M.

We can further improve execution time by also caching the images / audio examples from audbcards.Datacard, but I will handle this in a follow up pull request.


Further changes:


Newly added API entries:

image

image

image

image

image

image

hagenw commented 5 months ago

The only concern I have is about dependencies: is the code depending on a newer version of audbackend already? I see not changes in the pyproject.toml.

No, this does not yet depend on a newer audbackend version (version 2.0.0 is also not released yet, but consists only in the dev branch of audbackend). I will prepare a pull request for testing audbackend 2.0.0 after the caching speed is handled to avoid merge conflicts.