Open satra opened 3 years ago
things currently missing i believe are: neural data types and nwb object identifier.
FWIW: those which are marked "mandatory" in https://github.com/dandi/dandi-cli/blob/master/dandi/organize.py#L27 are "required" to be present. So if for #69 we allow users to specify additional ones, they would be required and I expect organize to blow if they are not present. (note that there is also "mandatory_if_not_empty")
"modalities" (which is pretty much neural data types if I get it right) - might be absent in .nwb file e.g. containing only general study/subject metadata. Do we mandate to have at least 1 in a file?
Not sure if "organize" though is the right place to "require" anything beyond what we hardcode or user specifies to be "mandatory". For anything else it should be the "validation".
this is about being able to recreate the filename from the metadata on the server. this is not about mandatory or whether that field is present in the file or not, but any part that's extracted from an nwb file to evaluate for organize should be stored in the metadata record.
then schema should mandate having those fields, validate guarantee it, and organize
could "double-ensure" by first running validation on metadata extracted from each file before considering it. But then schema also should somehow encode that some are required for some neural data types and not the other (e.g. probe_id
).
for mandating in the schema, we should then create NWBAsset after all. many of these fields are not relevant to the generic asset metadata.
and sorry -- I think took the initial issue a bit incorrectly. I agree that we indeed should have that metadata in the schema, although as mentioned above, "required" conditioning would be tricky in particular since neural data types are also tricky on their own.
Re NWBAsset
-- I think obj_id
is the only one. But imho it could become file_identifier: str
of some kind. For nwb - object_id. For any other we could just default to use of one of the digests. "modalities" - we would need to harmonize anyways even for BIDS.
The idea behind "file_identifier" -- asset id is minted upon any change (metadata/whatnot). "file_identifier" might help to track "identity" of the file(/blob?)
in relation to #69 it would be good to ensure that all nwb fields used to determine filename are in the metadata record.