UtrechtUniversity / yoda

A system for reliable, long-term storing and archiving large amounts of research data during all stages of a study.
https://utrechtuniversity.github.io/yoda/
GNU General Public License v3.0
44 stars 26 forks source link

[FEATURE] Automated metadata population / suggestion #471

Open erikvdbergh opened 3 weeks ago

erikvdbergh commented 3 weeks ago

Is your feature request related to a problem? Please describe.

Filling in metadata is a big task for many researchers, with a lot of redundant data that they need to enter (Such as affiliation, collaborators info etc.). It is also difficult in you are not familiar with standards and terms that are good to use as metadata (as many researchers are). Therefore, adding metadata is often neglected, postponed and forgotten, leading to orphan data in the Research area.

Describe the solution you'd like

We want a prepopulation or suggestions for metadata in the metadata form, based on automated analysis of the data that is being metadated. Preferably, standard fields such as name, affiliation etc. should be autofilled based on login information.

Metadata related related to the content could be suggested based on automated analysis of the data, e.g. by LLM or other analysis library. Preferably this process would output suggestions that are based on standard terms, so that metadata standardisation is maintained.

Describe alternatives you've considered

The alternative is doing it by hand, but like stated in the problem this is often neglected because of the time it takes.

Additional context

This request is inspired by the functionality that ManGO has, based on the Apache Tika library: https://tika.apache.org/. However, with LLMs becoming dominant in the past few years, a LLM based solution might perform better.

stsnel commented 1 day ago

Thank you for the proposal! We plan to gather feedback from other stakeholders on this idea, and expect we'll be able to give an update in roughly the coming two months.