GlobalPathogenAnalysisService / gpas-cli

The CLI client for GPAS SC2
Other
5 stars 2 forks source link

HCELEC-396 Only 1 Error is returned although there are Multiple Errors which show up in Electron Client for the same file #41

Closed KuzminaAnna closed 2 years ago

KuzminaAnna commented 2 years ago

Describe the problem/error: GPAS-CLI: Only 1 Error is returned although there are Multiple Errors which show up in Electron Client for the same file. Expected behavior: Multiple Errors should be returned similar to the Electron Client. Steps to Reproduce: Detail ALL steps to reproduce the issue. System Identification: N/A Reproducible? Yes If multiple areas are impacted list areas: N/A. If compatibility/interoperability is impacted list products (e.g.): N/A If this is a customer issue list name: N/A

Request/Response :

sushastr@sushastr-mac dist % ./cli-upload --environment dev --token ../nanopore/token.json ../nanopore/nanopore-fastq-with_multiple_errors.csv --processes 1 --json-messages

[15646] WARNING: file already exists but should not: /var/folders/pq/j43ss5tx45vfbv08whvyrqmc0000gn/T/_MEIThTe8J/pyarrow/lib.cpython-310-darwin.so

{

"validation": {

    "status": "failure",

    "errors": [

{ "error": "tag(s) {'$', '()', '&', '@'} are invalid for this organisation"

        }

    ]

}
Screen Shot 2022-06-30 at 5 29 42 PM

}

bede commented 2 years ago

This is expected behaviour currently – while errors are reported lazily, tags are the one exception, since pandera SchemaModels cannot be updated at runtime, and valid tags are only knowable at runtime.

It is supposed to be possible to convert them to DataFrameSchema which can have rules injected at runtime, but this hasn't so far worked in my hands.

bede commented 2 years ago

See https://github.com/GlobalPathogenAnalysisService/gpas-cli/issues/12

bede commented 2 years ago

ValidationErrors can be raised in 4 different places. Errors raised at each step will prevent progression to the next step. Most errors are expected to occur in 4, where errors are evaluated lazily by Pandera.

  1. Illegal characters in upload CSV path
  2. Errors parsing upload CSV
  3. Required columns for Schema selection (bam, fastq, paired, single etc) are missing
  4. Main (lazy) validation using Pandera SchemaModel

It isn't practical to combine all of these – I think the current validation flow is reasonable to be honest. Closing, but feel free to comment if you would like this reopened.