Closed theisuru closed 21 hours ago
Looking at ENA checklist,Field type is mostly text field, text area all of them accepts text/number everything. So invalid data type , we can not do.
@ESapenaVentura @snathanvj Running the following command on codon would show you all 26 uniq regex patterns in the checklists:
cd /nfs/production/tburdett/workstreams/fairification/checklists/checklist-converter/schema
grep --no-filename pattern *BSD* | sort | uniq
Most of them are dates and numbers. Some are strings that start with a certain character sequence.
A string like __XYZ_BANANA_123%$£
should fail all of them. You could find out by copying them to some online evaluator and check that this string fails them all.
changes are in this branch for both mandatory field error, enum error and validate the result using biovalidator and check validation results . https://github.com/ebi-ait/checklist-converter/tree/ck_38_add_random_errors_to_sample
I have executed the script for the list of accessions in accessions.csv
files are in /nfs/production/tburdett/workstreams/fairification/checklists/senthil/checklist-converter/data/invalid
Generated invalid files : /nfs/production/tburdett/workstreams/fairification/checklists/senthil/checklist-converter/data/invalid
validation results: /nfs/production/tburdett/workstreams/fairification/checklists/senthil/checklist-converter/data/invalid/validation_result
Result: all of the validated files conatins expected validation error
How to run:
To generate invliad biosample json files:
python3 invalid_sample_generator.py --action=generate
To validate all generated biosample jsons: first set up biovalidator as described in https://github.com/ebi-ait/checklist/issues/25
and then run
python3 invalid_sample_generator.py --action=validate
it will generate validation result files in data/invalid/validation_result directory
to check results:
mkdir -p data/invalid/enum; mv data/invalid/validation_result/*enum.json data/invalid/validation_result/enum
mkdir -p data/invalid/mandatory; mv data/invalid/validation_result/*mandatory.json data/invalid/validation_result/mandatory
./check_invalid_results.sh data/invalid/validation_result/enum 'must be equal to one of the allowed values' | grep missing
./check_invalid_results.sh data/invalid/validation_result/mandatory/ 'must have required property' | grep missing
created separate ticket for regexp errors #51
made a curl request with dummy json to biovalidator
curl -X POST -H "Content-Type: application/json" -d "{'schema':{},'data':{}}" http://host:3020/validate >> malformed_json_post_to_biovalidator.txt
output: {"error":"Received malformed JSON."}
results are in
/nfs/production/tburdett/workstreams/fairification/checklists/data/run008/malformed_json_post_to_biovalidator.txt
Currently most of the samples in our tests are valid samples. We want to test validation failures thoroughly as well. We will introduce one error per checklist data sample and see how it behaves. The errors should be introduced into the BioSamples documents. Later we will introduce multiple errors and check the error messages for correct and consistent behaviour.
Create a list with:
For clarity see below: