Add random errors to data to test validation failures(regexp)

snathanvj commented 6 days ago

Currently most of the samples in our tests are valid samples. We want to test validation failures thoroughly as well. We will introduce one error per checklist data sample and see how it behaves. The errors should be introduced into the BioSamples documents. Later we will introduce multiple errors and check the error messages for correct and consistent behaviour.

Create a list with:

- checklist accession
- field that is invalidated
- error type from list below

For clarity see below:

[x] invalid pattern regex - @ESapenaVentura

ESapenaVentura commented 6 days ago

The script has been pushed to /nfs/production/tburdett/workstreams/fairification/checklists/generate_invalid_patterns.py (Will create PR soon)

The script has been run with the following command:

python3 generate_invalid_patterns.py -i data/run009/jsons/ -s checklist-converter/schema/ -o data/run008/invalid/invalid_pattern/ -e 2

This generates ~2 errors per document (if possible) around patterns

The script writes a report in a csv file that indicates: document_path,checklist,invalid_path,error_type
Moved that report under ...data/run008/pattern_invalidation_error_report.csv
After PR I'll work on error validation via biovalidator

ESapenaVentura commented 6 days ago

ebi-ait/checklist-converter#6

ebi-ait / checklist

Add random errors to data to test validation failures(regexp) #51