When multiple garbage sample names are provided, ValidationError returns auto incremented fictitious sample names

GlobalPathogenAnalysisService / gpas-cli

The CLI client for GPAS SC2

Other

5 stars 2 forks source link

When multiple garbage sample names are provided, ValidationError returns auto incremented fictitious sample names #73

Closed bede closed 2 years ago

bede commented 2 years ago

Probably better to ditch the sample_name in these cases, causing the errors to be collapsed into a single error due to redundant errors being pruned? https://oc-collab.gc3.ocs.oraclecloud.com/browse/C900000008-816

bede commented 2 years ago

I've looked at this again and I can't see an obviously better way of handling this contrived edge case without disabling lazy error handling

bede commented 2 years ago

Better solution found, I think. Valid sample names are now all coerced to strings by Pandera, whereas these autoincremented indices are floats. I'm now catching these floats in remove_nones_duplicates_empties_ints_from_ld() and eliminating the sample_name key, meaning that sample-level errors for samples with garbage names are demoted to singleton batch-level errors.

bede commented 2 years ago

See https://github.com/GlobalPathogenAnalysisService/gpas-cli/commit/25ed7099c20102ec6772b0f01cc48a1e0078e2b3