Add random errors to data to test validation failures (mandatory, enums)

theisuru commented 2 weeks ago

Currently most of the samples in our tests are valid samples. We want to test validation failures thoroughly as well. We will introduce one error per checklist data sample and see how it behaves. The errors should be introduced into the BioSamples documents. Later we will introduce multiple errors and check the error messages for correct and consistent behaviour.

Create a list with:

- checklist accession
- field that is invalidated
- error type from list below

For clarity see below:

[x] missing mandatory fields
[x] invalid enum value - @snathanvj
[x] invalid json

snathanvj commented 1 week ago

Looking at ENA checklist,Field type is mostly text field, text area all of them accepts text/number everything. So invalid data type , we can not do.

amnonkhen commented 1 week ago

@ESapenaVentura @snathanvj Running the following command on codon would show you all 26 uniq regex patterns in the checklists:

cd /nfs/production/tburdett/workstreams/fairification/checklists/checklist-converter/schema
grep --no-filename pattern *BSD* | sort | uniq

Most of them are dates and numbers. Some are strings that start with a certain character sequence. A string like __XYZ_BANANA_123%$£ should fail all of them. You could find out by copying them to some online evaluator and check that this string fails them all.

snathanvj commented 6 days ago

changes are in this branch for both mandatory field error, enum error and validate the result using biovalidator and check validation results . https://github.com/ebi-ait/checklist-converter/tree/ck_38_add_random_errors_to_sample

I have executed the script for the list of accessions in accessions.csv

files are in /nfs/production/tburdett/workstreams/fairification/checklists/senthil/checklist-converter/data/invalid

Generated invalid files : /nfs/production/tburdett/workstreams/fairification/checklists/senthil/checklist-converter/data/invalid

validation results: /nfs/production/tburdett/workstreams/fairification/checklists/senthil/checklist-converter/data/invalid/validation_result

Result: all of the validated files conatins expected validation error

How to run:

To generate invliad biosample json files: python3 invalid_sample_generator.py --action=generate

To validate all generated biosample jsons: first set up biovalidator as described in https://github.com/ebi-ait/checklist/issues/25

and then run python3 invalid_sample_generator.py --action=validate

it will generate validation result files in data/invalid/validation_result directory

to check results:

mkdir -p data/invalid/enum;  mv data/invalid/validation_result/*enum.json data/invalid/validation_result/enum 
mkdir -p data/invalid/mandatory;  mv data/invalid/validation_result/*mandatory.json data/invalid/validation_result/mandatory 

./check_invalid_results.sh data/invalid/validation_result/enum 'must be equal to one of the allowed values' | grep missing
./check_invalid_results.sh data/invalid/validation_result/mandatory/ 'must have required property' | grep missing

snathanvj commented 6 days ago

created separate ticket for regexp errors #51

snathanvj commented 5 days ago

made a curl request with dummy json to biovalidator

curl -X POST -H "Content-Type: application/json" -d "{'schema':{},'data':{}}" http://host:3020/validate >> malformed_json_post_to_biovalidator.txt

output: {"error":"Received malformed JSON."}

results are in

/nfs/production/tburdett/workstreams/fairification/checklists/data/run008/malformed_json_post_to_biovalidator.txt

ebi-ait / checklist

Add random errors to data to test validation failures (mandatory, enums) #38