airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

MiAIRR Compliance Flag #750

Closed jday1 closed 2 months ago

jday1 commented 4 months ago

A small change which introduces a flag to check for MiAIRR Compliance in the data (defaults to True).

If True, the current behaviour of validate_airr is followed.

If False, the data structure is checked by the MiAIRR and nullable checks are ignored.

There is a larger piece of work to unify the nullable check and the miarr check into one piece of logic as there is duplication. Please correct me if I'm wrong but miairr: essential if and only if nullable: false. As miairr is more developed than the nullable check, nullable could be deprecated in favour of only having and checking the miairr field.

I'm happy to implement this in the future if it is something that the AIRR community would find valuable.

schristley commented 4 months ago

Please correct me if I'm wrong but miairr: essential if and only if nullable: false.

Not exactly, while miairr: essential implies nullable: false, the opposite implication is not true, there are fields with nullable: false that are not part of MiAIRR.

As miairr is more developed than the nullable check, nullable could be deprecated in favour of only having and checking the miairr field.

That would actually complicate validation for APIs like the ADC API. Because nullable is a standard keyword in OpenAPI 3, standard validation routines can insure that nullable flag is being enforced properly in the data. However, miairr is a custom extension and thus cannot be validated without custom code.

The idea of a MiAIRR compliance flag might be useful, but not necessarily for miairr: essential as standard schema keywords like required and nullable can enforce that. It might be more useful for miairr: important

jday1 commented 4 months ago

Please correct me if I'm wrong but miairr: essential if and only if nullable: false.

Not exactly, while miairr: essential implies nullable: false, the opposite implication is not true, there are fields with nullable: false that are not part of MiAIRR.

As miairr is more developed than the nullable check, nullable could be deprecated in favour of only having and checking the miairr field.

That would actually complicate validation for APIs like the ADC API. Because nullable is a standard keyword in OpenAPI 3, standard validation routines can insure that nullable flag is being enforced properly in the data. However, miairr is a custom extension and thus cannot be validated without custom code.

The idea of a MiAIRR compliance flag might be useful, but not necessarily for miairr: essential as standard schema keywords like required and nullable can enforce that. It might be more useful for miairr: important

Thanks for the clarification.

In that case, I think the long term aim after this PR would be a miairr_compliance_flag which is set to essential by default, but also supports important and defined.

schristley commented 4 months ago

I think a more useful design is to quantify how compliant some data is to MiAIRR, specifically for the different levels. By quantify, I mean count up the number of fields in the schema with (say) miairr: important then in the data, count up how many of those fields have data in them, then output the %.

In terms of implementation, this wouldn't be an additional flag on the various load functions. Instead it would be a standard-alone function, like validate_object, where you pass the data and what compliance level to check as parameters. It would be simpler function because it only needs to recurse through the data structure and check MiAIRR.

Then to make it a bit more useful, update airr-tools program to accept a "miairr compliance" option to run on a data file, so it can be easily run in scripts and workflows.

schristley commented 4 months ago

@jday1 Is this closer to your intention of a "MiAIRR Compliance Flag"?

jday1 commented 4 months ago

@jday1 Is this closer to your intention of a "MiAIRR Compliance Flag"?

Thanks for getting back to me @schristley.

For MiARR, that is the correct method which I can implement in a future PR. I have modified this PR so that the option is called nullable_check. If True (the default), the current behaviour occurs. If False, fields which are declared non-nullable are allowed to be null.

javh commented 2 months ago

@schristley are you good with this as is?

schristley commented 2 months ago

@jday1 This branch is very old and cannot be merged as is. Your changes look ok to me, can you create a new branch/PR based upon recent master? That would insure there are no conflicts with all of the recent v2 work and we can merge easily.

jday1 commented 2 months ago

@jday1 This branch is very old and cannot be merged as is. Your changes look ok to me, can you create a new branch/PR based upon recent master? That would insure there are no conflicts with all of the recent v2 work and we can merge easily.

Thanks @schristley. I have merged the most recent version of master into this branch and the checks all pass. Please merge when suitable.