SPARC-FAIR-Codeathon / sparc-me

A python tool to explore, enhance, and expand SPARC datasets and their descriptions
Apache License 2.0
7 stars 6 forks source link

Automatically appending to manifest.xlsx files when data is added to SDS folders #80

Closed PrasadBabarendaGamage closed 1 year ago

PrasadBabarendaGamage commented 2 years ago
  1. User wants to add a file to a dataset.
  2. User calls add_primary_data, or add_derivative_data function in sparc-me, specifying and the location of the data file/derivative file and which subject, sample the data should be added to in the SDS.
  3. add_primary_data or add_derivative_data functions automatically append rows in subject.xlsx and sample.xlsx
  4. Update dataset manifest.xlsx

Hi @tgbugs, for SDS version 2.0.0, should we make sure there is a manifest.xlsx in each folder or search for an existing manifest.xlsx in parent folders?

tgbugs commented 2 years ago

Manifests can exist anywhere in the folder hierarchy and can refer to relative paths so you need to find all files in the hierarchy that match manifest*.*.

tgbugs commented 2 years ago

If I were writing this functionality from scratch, I would use a single manifest at the very top level of the folder structure (next to the dataset_description file) so that you don't have to deal with the manifest file itself moving, only the files inside the hierarchy moving. Franky, taking a checksum of the files and tracking how that moves as a primary key is likely to be more robust because it can recover from cases where the file system watcher was offline.

PrasadBabarendaGamage commented 2 years ago

Thanks very much @tgbugs! Sounds very sensible - we will have a go at implementing as you suggest

PrasadBabarendaGamage commented 2 years ago

@savindi-wijenayaka Here is some information on manifest files: