ensure all fields required for nwb filename are in the asset metadata

dandi / dandi-cli

DANDI command line client to facilitate common operations

https://dandi.readthedocs.io/

Apache License 2.0

22 stars 26 forks source link

ensure all fields required for nwb filename are in the asset metadata #618

Open satra opened 3 years ago

satra commented 3 years ago

in relation to #69 it would be good to ensure that all nwb fields used to determine filename are in the metadata record.

satra commented 3 years ago

things currently missing i believe are: neural data types and nwb object identifier.

yarikoptic commented 3 years ago

FWIW: those which are marked "mandatory" in https://github.com/dandi/dandi-cli/blob/master/dandi/organize.py#L27 are "required" to be present. So if for #69 we allow users to specify additional ones, they would be required and I expect organize to blow if they are not present. (note that there is also "mandatory_if_not_empty")

"modalities" (which is pretty much neural data types if I get it right) - might be absent in .nwb file e.g. containing only general study/subject metadata. Do we mandate to have at least 1 in a file?

Not sure if "organize" though is the right place to "require" anything beyond what we hardcode or user specifies to be "mandatory". For anything else it should be the "validation".

satra commented 3 years ago

this is about being able to recreate the filename from the metadata on the server. this is not about mandatory or whether that field is present in the file or not, but any part that's extracted from an nwb file to evaluate for organize should be stored in the metadata record.

yarikoptic commented 3 years ago

then schema should mandate having those fields, validate guarantee it, and organize could "double-ensure" by first running validation on metadata extracted from each file before considering it. But then schema also should somehow encode that some are required for some neural data types and not the other (e.g. probe_id).

satra commented 3 years ago

for mandating in the schema, we should then create NWBAsset after all. many of these fields are not relevant to the generic asset metadata.

yarikoptic commented 3 years ago

and sorry -- I think took the initial issue a bit incorrectly. I agree that we indeed should have that metadata in the schema, although as mentioned above, "required" conditioning would be tricky in particular since neural data types are also tricky on their own. Re NWBAsset -- I think obj_id is the only one. But imho it could become file_identifier: str of some kind. For nwb - object_id. For any other we could just default to use of one of the digests. "modalities" - we would need to harmonize anyways even for BIDS.

yarikoptic commented 3 years ago

The idea behind "file_identifier" -- asset id is minted upon any change (metadata/whatnot). "file_identifier" might help to track "identity" of the file(/blob?)