con / nwb2bids

Reorganize NWB files into a BIDS directory layout.
1 stars 1 forks source link

Ability to operate solely on already extracted metadata of DANDI dandisets #7

Open yarikoptic opened 6 months ago

yarikoptic commented 6 months ago

We do have metadata across all dandisets assets extracted and made available both in

Ideally the tool should be able to operate (might be a mode option of some kind) just on the metadata records and provide e.g. as output a json/tsv list of records with target filename per each asset.

If metadata is lacking, we should extend it at https://github.com/dandi/dandi-schema/ level and https://github.com/dandi/dandi-cli to support extraction/harmonization where needed.

Before even doing that, internal code internally should be aware of such target use-case -- should get a clear separation of steps of

  1. metadata-extraction/harmonization, e.g. get_metadata_from_files(files: list[Path]) -> list[AssetMetadata]
  2. analytics for BIDS files construction based on metadata, e.g. get_bids_filenames(list[AssetMetadata]) -> list[BIDSFile]
    • tricky part is that some files would be "generated" and not correspond to specific asset but rather often "summary" over assets, e.g. dataset_description.json, participants.tsv
      • could be done via creating a ConcreteBIDSFile subclass of BIDSFile which would just store the content of the target file
  3. BIDS dataset instantiation, e.g. populate_bids_files(list[BIDSFile]) -> None - which given the list of files from above would instantiate. Could have options of various kinds or have different implementations (e.g. creating datalad dataset via https://docs.datalad.org/en/stable/generated/man/datalad-addurls.html if originally operating on list of URLs; or another one which downloads etc)
TheChymera commented 5 months ago

Do I understand correctly that this would require nwb2bids to depend on DANDI? If so I think that would be a problem, because neuroconv should depend on this.