LAAC-LSCP / ChildProject

Python package for the management of day-long recordings of children.
https://childproject.readthedocs.io
MIT License
13 stars 5 forks source link

Better handling of multi-tier metadata #185

Closed lucasgautheron closed 3 years ago

lucasgautheron commented 3 years ago

Is your feature request related to a problem? Please describe.

Sometimes, access to parts of the metadata can be restricted to certain users. These limitations can be complex, as there can be more than two access tiers.

These access limitations are implemented by splitting the metadata across several files with different access rules. However, this poses a number of problem:

Describe the solution you'd like

This needs more thoughts, but ideally the solution should allow all of this:

Zero metadata approach

We could merge all the files automatically, e.g. merge metadata/recordings.csv with everything inside of metadata/recordings/* (same for children.csv). Which is my favorite approach I think, but then we need something to tell which file to prioritize in case of conflicting columns. We could do alphabetical order. I started implementing this

Flexible approach

Maybe some file metadata/description.csv with the following structure:

table field file priority description values
children languages confidential/children.csv 0 children languages english,french

The priority value can be set so that if several files may contain the same field, the one with the highest priority that is available is chosen. This is useful, for instance, when fake dates are provided to most users, but we still want to preserve the correct dates somewhere.

From this, it is easy to merge all available dataframes dynamically (which the package would do by itself, for those who are interested in using our python API)

Only non-standard fields should be documented in metadata/description.csv Note that this can also solve the problem of non-standards fields in general, which require some documentation

Simpler approach

Maybe some file metadata/description.csv with the following structure:

table file priority description
children confidential/children.csv 0 adds children languages

What do you think ?