IPS-LMU / emuR

The main R package for the EMU Speech Database Management System (EMU-SDMS)
http://ips-lmu.github.io/EMU.html
23 stars 15 forks source link

Metadata functionality implementation #259

Open samgregory opened 2 years ago

samgregory commented 2 years ago

With grateful thanks to @FredrikKarlssonSpeech for his permission to use code in the https://github.com/humlab-speech/reindeer re-implementation of emuR, here is a working and tested implementation of metadata for emuR.

These functions the meets the requirements outlined in multiple issues, notably #130

There are some major (breaking) changes compared to the reindeer implementation:

Complete documentation exists for all new functionality and new .Rmd files have been supplied in this pulled request. Unit testing of core features (get_metadata, add_digest, import / export) has been implemented with dummy metadata for the ae test database.

I note the addition of the memoise dependency. It was found during testing that calls to get_metadata were very slow for any database of a non-trivial size. emuR resolves the same issue of data distributed throughout _annot.json files by creating and accessing an SQL cache. This method seemed to be overly complicated for metadata so memoise was to cache the results from a call to export_metadata. EmuR will only read _meta.json files once per laoded database handle or again one changes written to _meta.jsons from within emuR. Any changes to these files outside of emuR require a get_metadata(clearCache = TRUE) call.

openxlsx becomes an suggested package, without which import and export of Excel spreadsheets will fail (gracefully, with a message to load the openxlsx package).