ebi-ait / checklist

Template repository for checklists
Apache License 2.0
1 stars 0 forks source link

Test current ENA validation behaviour in dev #81

Open ESapenaVentura opened 1 week ago

ESapenaVentura commented 1 week ago

After task #59, to ensure that the validation produces the same result, I am going to run a test submission of the documents we have available in CODON and compare it with our previous results, performed before the change.

Run011

Copy the XMLs to the new directory

mkdir run011
cd run011
cp -r ../run010/xmls/* xmls/

Run the XML validation script

cp /nfs/production/tburdett/workstreams/fairification/checklists/checklist-converter/src/validate_xml_against_ena_dev.py validate_xml_against_ena_dev.py
export PYTHONHOME=/hps/software/jupyterhub
export PATH=$PATH:$PYTHONHOME/bin
mkdir xml_validation
python validate_xml_against_ena_dev.py --input /nfs/production/tburdett/workstreams/fairification/checklists/data/run011/xmls/ --out_dir /nfs/production/tburdett/workstreams/fairification/checklists/data/run011/xml_validation/ --user $ENA_USER --password $ENA_PASSWORD

NOTE: the script

Compare results

First, should compare if the number of invalid documents are the same. To do this:

python3 unique_validation_results.py /nfs/production/tburdett/workstreams/fairification/checklists/data/run011/xml_validation /nfs/production/tburdett/workstreams/fairification/checklists/data/run010/xml_validation

This creates 3 documents:

 Check the discrepancy between old and new validation

catting the results, I can see that the only difference is between the documents validated against checklist ERC000056 (See example below)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="receipt.xsl"?>
<RECEIPT receiptDate="2024-09-13T16:22:11.656+01:00" submissionFile="SUBMISSION" success="false">
     <SUBMISSION alias="SUBMISSION-13-09-2024-16:22:11:652"/>
     <MESSAGES>
          <ERROR>In sample, alias: "e21e186a-2b7f-4eaa-b6f6-7d8a7389ae46". Invalid checklist: "ERC000056".</ERROR>
     </MESSAGES>
     <ACTIONS>ADD</ACTIONS>

I am talking with Dipayan/Isuru and trying to find out what's happening with those documents

Compare validation messages

For the results that are invalid, I want to compare that the validation messages are:

For the compatibility, I want to test:

python3 equal_number_error_messages.py

We have the exact same number of errors (When we ignore the extra duplicated generic validation error + the runtime error), so no need to do extra checks to see if they point to the same properties - They should.

However, manual check of the errors have led me to see that the new validation response does not point out to the property, making the messages not useful. Pointing it out in the summary.

Summary

ESapenaVentura commented 5 days ago

PLEASE IGNORE THIS COMMENT, THIS WAS DUE TO USING AN INVALID ENVIRONMENT AND ACTION

Schema generation error

I think units are still not being represented/generated correctly in the schemas we generate. An example is with checklist ERC00040, sample SAMEA104451028. This fails XML validation on ENA, but does not fail JSON schema validation. The property "depth" should enforce units, and for those units to have a specific set of requirements (I think e.g. for depth is m or mm). This property in the schema does not enforce that at all.

Checklist does not exist

Documents against checklist ERC000056 failed validation because, well... it does not exist

theisuru commented 5 days ago

@ESapenaVentura are you using the dev environment @ wp-np2-44? The ERC000040 seems to have required unit there.

ESapenaVentura commented 5 days ago

Got it now - After talking with Dipayan, we figured out in the dev environment, the VALIDATE action does not work properly

I am repeating the run modifying the action to ADD. It's a lot slower, so I think we will get the results on Monday

ESapenaVentura commented 1 day ago

The results are in the body of the ticket - I have already spoken with Dipayan and Isuru about them