dandi / dandi-schema

Schemata for DANDI archive project
Apache License 2.0
7 stars 10 forks source link

Provide more specific exceptions/errors than just value_error #158

Open yarikoptic opened 1 year ago

yarikoptic commented 1 year ago

ATM we use ValueError as a generic exception for when value is not the one we like to have

❯ git grep 'raise ValueError'
dandischema/datacite.py:                        raise ValueError(
dandischema/digests/dandietag.py:            raise ValueError("File is larger than the S3 maximum object size.")
dandischema/digests/dandietag.py:            raise ValueError("Not all part hashes submitted")
dandischema/digests/dandietag.py:            raise ValueError("Digesting new part when current part is not complete")
dandischema/digests/dandietag.py:            raise ValueError("Partial update extended past end of file")
dandischema/digests/zarr.py:        raise ValueError(f"Cannot parse directory digest {digest}")
dandischema/digests/zarr.py:        raise ValueError("Not found")
dandischema/digests/zarr.py:        raise ValueError("Cannot compute a Zarr checksum for an empty directory")
dandischema/metadata.py:        raise ValueError("Provided object has no known schemaKey")
dandischema/metadata.py:        raise ValueError(
dandischema/metadata.py:                raise ValueError(
dandischema/metadata.py:        raise ValueError(f"Current target schemas: {ALLOWED_TARGET_SCHEMAS}.")
dandischema/metadata.py:        raise ValueError(f"Current input schemas supported: {ALLOWED_INPUT_SCHEMAS}.")
dandischema/metadata.py:        raise ValueError(f"Cannot migrate from {schema_version} to lower {to_version}.")
dandischema/metadata.py:                    raise ValueError("Cannot auto migrate. SchemaKey missing")
dandischema/metadata.py:        raise ValueError("Provided metadata has no schema version")
dandischema/metadata.py:        raise ValueError(
dandischema/models.py:        raise ValueError(f"Could not generate a klass or items from {data}")
dandischema/models.py:            raise ValueError(
dandischema/models.py:            raise ValueError(
dandischema/models.py:            raise ValueError("Both identifier and url cannot be None")
dandischema/models.py:            raise ValueError(
dandischema/models.py:            raise ValueError("At least one contributor must have role ContactPerson")
dandischema/models.py:                raise ValueError("A zarr asset must have a zarr checksum.")
dandischema/models.py:                raise ValueError("Digest cannot have both etag and zarr checksums.")
dandischema/models.py:                raise ValueError(
dandischema/models.py:                raise ValueError(
dandischema/models.py:                raise ValueError("A non-zarr asset must have a dandi-etag.")
dandischema/models.py:                raise ValueError("Digest cannot have both etag and zarr checksums.")
dandischema/models.py:                raise ValueError(
dandischema/models.py:            raise ValueError(
dandischema/models.py:                raise ValueError("A non-zarr asset must have a sha2_256.")
dandischema/models.py:                raise ValueError(
dandischema/tests/test_models.py:            raise ValueError(f"{qname},{klass} already exists {qnames[qname]}")
dandischema/utils.py:        raise ValueError(r"Version must be well formed: \d+\.\d+\.\d+")

and some of them in particular whenever we indicate that the value is required! (not sure why we make it optional then to start with? didn't check)

❯ git grep 'must have a' | grep -v appropriate
dandischema/models.py:                raise ValueError("A zarr asset must have a zarr checksum.")
dandischema/models.py:                raise ValueError("A non-zarr asset must have a dandi-etag.")
dandischema/models.py:                raise ValueError("A non-zarr asset must have a sha2_256.")
dandischema/tests/test_models.py:        "A non-zarr asset must have a sha2_256." in el["msg"]
dandischema/tests/test_models.py:            "A zarr asset must have a zarr checksum." in val
dandischema/tests/test_models.py:            "A zarr asset must have a zarr checksum." in val

That results that whenever we catch ValidationError in dandi-cli we cannot really tell if it is the error that value is completely absent or something else without matching a message (unreliable, shouldn't be done):

*(Pdb) p e.errors()
[{'loc': ('digest',), 'msg': 'A zarr asset must have a zarr checksum.', 'type': 'value_error'}]

I guess (didn't check) if we specialize to other types of derived from ValueError exceptions, like MissingValue(ValueError) we might get 'type': 'missing_value' ?

satra commented 1 year ago

is there a specific list you would like that you would do something programmatic with (i.e. the semantics of typing are useful to you somewhere)? if it's primarily for humans, we should just go through the messages and improve them. also if you simply want to know its from dandischema, we can create a special ValueError. i think subtyping exceptions creates technical debt, unless it is useful in some form.

yarikoptic commented 1 year ago

So far I see clear distinction only for MissingValue (it was not even provided! first thing to look at). yet to see in detail/compare to how it looks like whenever it is really missing according to pydantic schema itself. Then useful to harmonize to see how it looks like whenever value is not corresponding to how pydantic reacts that value is not satisfying what it expects.

As for why -- we are taking that type as a part of a ValidationResult ID. We later might group based on what is missing vs what is entered but incorrectly etc. So it then would become not only about display but also about being able to tell one from another.

danlamanna commented 1 year ago

Related to https://github.com/dandi/dandi-archive/pull/1490.

satra commented 1 year ago

@yarikoptic - most of the exceptions come directly from pydantic. are you only saying to have a MissingValueError or match it to pydantic's validationerror, which is based on message. or create a whole scheme of exceptions. if that latter, please describe which pydantic message should map to what exception.

yarikoptic commented 1 year ago

What matters for me is that 'type': 'value_error' within ValidationError.errors() returns. I want it to be missing_value_error. I don't know really what it entails within dandi-schema/pedantic, but I thought the first step could be to derive custom exception.
With https://github.com/dandi/dandi-archive/pull/1490 by @danlamanna in mind, I would say it is worth creating base class of DandiSchemaError, and sub/multi-class

class DandiValueError(ValueError, DandiSchemaError):
    pass

class DandiMissingValueError(DandiValueError):
    pass

and throughout dandi-schema code base use DandiValueError instead of plain ValueError so anyone catching exceptions from us could tell some other ValueError etc apart from the ones we raise.

satra commented 1 year ago

can we just leave it as pydantic? i can replace where we generate ValueError with a pydantic ValidationError object.

In [12]: try:
    ...:     Dandiset()
    ...: except ValidationError as e:
    ...:     print(e.__repr__())
    ...: 

ValidationError(model='Dandiset', errors=[{'loc': ('id',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('name',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('description',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('contributor',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('license',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('identifier',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('citation',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('assetsSummary',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('manifestLocation',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('version',), 'msg': 'field required', 'type': 'value_error.missing'}])

satra commented 1 year ago

we will have to also decide what we do with both JSON validation and pydantic validation, which are both created in the validation function.

yarikoptic commented 1 year ago

For my current need having 'value_error.missing' I believe would suffice.