ebi-ait / checklist

Template repository for checklists
Apache License 2.0
1 stars 0 forks source link

validate & compare, run 003 #28

Closed amnonkhen closed 3 weeks ago

amnonkhen commented 1 month ago

Run another validation comparison after the synonyms fix (#26). Follow procedure on #25

amnonkhen commented 1 month ago

Fix removed the majority of errors:

mismatch type run002 run003
ena failed, schema valid Total 16 16
ena valid, schema invalid Total 132 10
ESapenaVentura commented 1 month ago

Finished the comparison! it has yield 3 main results for the discrepancies:

Problematic JSON Schema conversion

Problematic ENA data retrieval I need to consult this one on monday with @Jeena-Rajan or @snathanvj . Some of the samples seem to have fields without any value; that is, "tag" is there but "value" does not exist. I wonder if this has been redacted, as the value that is missing is a patient sample tumor site... (This is the String.split(String)" because "tagValue" is null error) See https://github.com/ebi-ait/checklist/issues/33

Mystery box Rows 410 to 413 present a mysterious issue. ENA validation has worked as intended here; first of all, the value is required in the ENA checklist, and it is not in the JSON schema converted one, which is weird. But the weirdest thing of all is that, even if not required, it should pick it up and the value should fail against the regex. I have not been able to figure out exactly what is happening here

Jeena-Rajan commented 1 month ago

Have also commented the below on the ENA vs schema validation comparison spreadsheet

Problematic ENA data retrieval This is a strange one. It's also missing the mandatory attribute 'patient tumor site of collection'. The related examples are all from the same submitter so I wonder if the validator was not working at the point when the samples were submitted.

Mystery box Rows 410-413 'depth' is only recommended here (not mandatory) so behaviour is as expected for JSON schema

ESapenaVentura commented 4 weeks ago

Comment on the mystery box - Even if not required, JSON validation is not correct. The value "Not appliccable" should not be accepted by the regex validation of the field https://github.com/ebi-ait/checklist-converter/blob/main/schema/ERC000043-ENA.json#L508