dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 28 forks source link

Change validation of files (assets) to align with validation on the server #1448

Open yarikoptic opened 5 months ago

yarikoptic commented 5 months ago

The underlying trigger for the issue is

where assets started to fail validation, preventing publishing of dandisets, only when already on the archive. It boiled down to the fails in validation against jsonschema export of pydantic models which is done to guarantee that meditor validation could be done online (thus in JS), and the difference in treating date-time format by jsonschema and pydantic (datetime).

Looking at the code, in dandi-archive we do

from dandischema.metadata import aggregate_assets_summary, validate
...
            metadata = asset.published_metadata()
            validate(metadata, schema_key='PublishedAsset', json_validation=True)

while in the dandi-cli we do not use dandischema..validate construct and rather

        try:
            asset = self.get_metadata(digest=self._DUMMY_DIGEST)
            BareAsset(**asset.model_dump())
        except ValidationError as e:
            if devel_debug:
                raise
            return _pydantic_errors_to_validation_results(
                e, self.filepath, scope=Scope.FILE
            )

so in effect only validating against Pydantic (and for a BareAsset).

edit 1: moreover we do not even get to above from NWBAsset.get_validation_errors due to

        if schema_version is not None:
            errors.extend(
                super().get_validation_errors(
                    schema_version=schema_version, devel_debug=devel_debug
                )
            )
        else:
            # make sure that we have some basic metadata fields we require
            try:
                origin = ValidationOrigin(
                    name="nwbinspector",
                    version=str(_get_nwb_inspector_version()),
                )

                for error in inspect_nwbfile(
...

so we solely rely on nwbinspector inspection for NWB file but that one does not care about dandischema "directly". So we might want to change that one way or another...

We should harmonize, and I feel that here we should use dandischema.metadata.validate. I think it also aligns better with our desire to separate models from intended "validity" assumptions.