Closed amnonkhen closed 3 weeks ago
Fix removed the majority of errors:
mismatch type | run002 | run003 |
---|---|---|
ena failed, schema valid Total | 16 | 16 |
ena valid, schema invalid Total | 132 | 10 |
Finished the comparison! it has yield 3 main results for the discrepancies:
Problematic JSON Schema conversion
geographic location (country and/or sea)
MAY have synonyms. This is the case of "Czechia", which also has an accepted synonym in "Czech Republic") - See #29units
is an enum value that only accepts certain types of units. In the converted JSON schema, units is just defined as "string", with no restriction (See example here) - See #30Problematic ENA data retrieval
I need to consult this one on monday with @Jeena-Rajan or @snathanvj . Some of the samples seem to have fields without any value; that is, "tag" is there but "value" does not exist. I wonder if this has been redacted, as the value that is missing is a patient sample tumor site... (This is the String.split(String)" because "tagValue" is null
error) See https://github.com/ebi-ait/checklist/issues/33
Mystery box Rows 410 to 413 present a mysterious issue. ENA validation has worked as intended here; first of all, the value is required in the ENA checklist, and it is not in the JSON schema converted one, which is weird. But the weirdest thing of all is that, even if not required, it should pick it up and the value should fail against the regex. I have not been able to figure out exactly what is happening here
Have also commented the below on the ENA vs schema validation comparison spreadsheet
Problematic ENA data retrieval This is a strange one. It's also missing the mandatory attribute 'patient tumor site of collection'. The related examples are all from the same submitter so I wonder if the validator was not working at the point when the samples were submitted.
Mystery box Rows 410-413 'depth' is only recommended here (not mandatory) so behaviour is as expected for JSON schema
Comment on the mystery box - Even if not required, JSON validation is not correct. The value "Not appliccable" should not be accepted by the regex validation of the field https://github.com/ebi-ait/checklist-converter/blob/main/schema/ERC000043-ENA.json#L508
Run another validation comparison after the synonyms fix (#26). Follow procedure on #25