dandi / dandi-schema

Schemata for DANDI archive project
Apache License 2.0
5 stars 8 forks source link

Add validator to ensure a contact person has email provided #235

Closed candleindark closed 2 months ago

candleindark commented 2 months ago

This PR provides a model level validator to ensures that a contributor acting as a contact person has an email provided per request of https://github.com/dandi/dandi-schema/issues/189.

satra commented 2 months ago

looks reasonable to me. we should add some tests.

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.81%. Comparing base (57b503a) to head (146b6e2). Report is 2 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #235 +/- ## ========================================== - Coverage 97.58% 91.81% -5.78% ========================================== Files 16 16 Lines 1701 1722 +21 ========================================== - Hits 1660 1581 -79 - Misses 41 141 +100 ``` | [Flag](https://app.codecov.io/gh/dandi/dandi-schema/pull/235/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dandi) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/dandi/dandi-schema/pull/235/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dandi) | `91.81% <100.00%> (-5.78%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dandi#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

candleindark commented 2 months ago

@yarikoptic @satra I think this is good to go except that you may want to adjust the dandi-cli due to the change brought by this PR.

satra commented 2 months ago

@yarikoptic - should the CLI fail if this validation doesn't pass. i.e. do we force the user to adjust the metadata through the UI to fix validation issues and redownload the dandiset.yaml, before data can be uploaded?

yarikoptic commented 2 months ago

@yarikoptic - should the CLI fail if this validation doesn't pass. i.e. do we force the user to adjust the metadata through the UI to fix validation issues and redownload the dandiset.yaml, before data can be uploaded?

no, I do not think we anyhow concern ourselves with validity of the dandiset.yaml upon upload. Filed

dedicated to that. This PR is pretty much orthogonal IMHO.

yarikoptic commented 2 months ago

Only loosely related to this PR (I guess) -- whenever we approach linkml.io expression, we would need to define ContactPerson as subclass of a Person and make email mandatory... Or is there some similarly flexible way to define conditional validations in linkml @satra @candleindark ?

candleindark commented 2 months ago

Only loosely related to this PR (I guess) -- whenever we approach linkml.io expression, we would need to define ContactPerson as subclass of a Person and make email mandatory... Or is there some similarly flexible way to define conditional validations in linkml @satra @candleindark ?

I don't know how at this point. I was actually thinking about this. The flexibility of the current arrangement comes from executing logic in Python defined as a validator for the Contributor Pydantic model. In fact, the logic currently is not even exported to the JSON schema. We may indeed need to use some other mechanism to express the given relationship in linkml.

sneakers-the-rat commented 2 months ago

in linkml you'd use rules

so that would be something like this

classes:
  Person:
    slots:
      - role
      - email
    rules:
      - preconditions:
          slot_conditions:
            role:
              equals_string: ContactPerson
        postconditions:
          slot_conditions:
            email:
              required: true

which we would need to modify the pydantic generator to support rules with validators to make

@model_validator(mode='after')
def rule_0(self):
    # some type inference would have to happen at generation time, but something like...
    if self.role is not None and 'ContactPerson' in self.role:
        assert self.email is not None

and then modify the pydantic model schema with json_schema_extra

{
"if": {
  "properties": {"role": {"contains": { "const": "ContactPerson" }}},
},
"then": {
  "properties": {"email": {"required": true }}
}

but i think you're probably right that it would be cleaner to use a subclass with a type designator

enums:
  RoleName:
    permissible_values:
      ContactPerson:

classes:
  Person:
    attributes:
      role:
        range: RoleName
        designates_type: true
      email:
        range: string
        required: false

  ContactPerson:
    slot_usage:
      email:
        required: true

and reconfigure pydanticgen to make use of discriminator in fields.

satra commented 2 months ago

when we get to linkml, let's separate out validation logic (which changes over time), from schema and schema-types which are likely more stable. this is also going to be relevant to partial models (before/after upload, before/after publish, etc.,.).

yarikoptic commented 2 months ago

good -- I think we are arriving at cleaner separation of "validation" against desired requirements vs "validation" of the model... as somewhat being also discussed in

But as such I would take this PR as ready and merge it.