Data set capture - Githubissues

Wondering about this approach from hugging face to managing data

Each dataset is a Git repository, equipped with the necessary scripts to download the data and generate splits for training, evaluation, and testing. For information on how a dataset repository is structured, refer to the Structure your repository guide. Following the supported repo structure will ensure that your repository will have a preview on its dataset page on the Hub.

This would in turn though require that

we had an ability to mirror external repos into DSH (it is possible with Gitea ?w gitlab)
we create synthetic/demo data externally in the chimera org (work with ATI)

UCL-Chimera / bellerophon

Data set capture #42