Sage-Bionetworks / data_curator

Data and metadata ingress app
Apache License 2.0
10 stars 21 forks source link

Validate permits ambiguous entries. #233

Closed mialy-defelice closed 2 years ago

mialy-defelice commented 3 years ago

Describe the bug

To Reproduce Center: Stanford Template: BulkWES1

Screenshots (optional)

Screen Shot 2021-11-19 at 1 55 21 PM

If applicable, add screenshots to help explain your problem.

Additional context (optional) Add any other context about the problem here HTAN Bulk WES Level 1 _incorrect.csv .

rrchai commented 2 years ago

@mialy-defelice Thanks for reporting.

I can reproduce the same behavior. The app just structures the errors coming out from the backend. My initial guess is the schematic allows these incorrect values, but I am going to double check if the app mis-captures the errors.

rrchai commented 2 years ago

Below are the invalid errors from the validateModelManifest in the app. Is there any way to test if it is the expected behavior from schematic?

[
    [
        2,
        "File Format",
        "'' is not one of ['png', 'bam', 'dat', 'fig', 'tar', 'excel', 'sav', 'bedgraph', 'mzML', 'czi', 'md', 'hdf5', 'svs', 'RData', 'tagAlign', 'gct', 'mex', 'am', 'doc', 'cloupe', 'gtf', 'R script', 'locs', 'gzip', 'json', 'OME-TIFF', 'tsv', 'flagstat', 'bcf', 'wiggle', 'seg', 'm', 'bed narrowPeak', 'raw', 'mov', 'txt', 'sif', 'bed', 'svg', 'cel', '7z', 'tif', 'pzfx', 'jpg', 'bai', 'dup', 'powerpoint', 'zip', 'pdf', 'mpg', 'bigwig', 'fasta', 'fastq', 'msf', 'plink', 'sqlite', 'recal', 'csv', 'gff3', ",
        ""
    ],
    [
        3,
        "Library Selection Method",
        "'Hello' is not one of ['Hybrid Selection', 'miRNA Size Fractionation', 'rRNA Depletion', 'Other', 'Poly-T Enrichment', 'Affinity Enrichment', 'PCR', 'Random']",
        "Hello"
    ],
    [
        3,
        "Target Capture Kit",
        "'Do' is not one of ['Custom MSK IMPACT Panel - 341 Genes', 'Custom SureSelect CGCI-BLGSP Panel - 4.6 Mb', 'SeqCap EZ HGSC VCRome v2.1', 'Custom MSK IMPACT Panel - 410 Genes', 'SeqCap EZ Human Exome v2.0', 'Custom Targets File Provided', 'Custom SeqCap EZ HGSC VCRome v2.1 ER Augmented v2', 'Custom Twist Broad PanCancer Panel - 396 Genes', 'Custom GENIE-DFCI Oncopanel - 300 Genes', 'unknown', 'Custom SureSelect CGCI-HTMCP-CC KMT2D And Hotspot Panel - 37.0 Kb', 'Nextera Rapid Capture Exome v1.2', '",
        "Do"
    ]
]

Note: the current develop/prod instances have not been updated to the latest version of schematic (not sure if there is a new update/fix on schematic related to this issue.

> git branch -v
* develop              33938f9 Merge pull request #532 from Sage-Bionetworks/develop-empty-dataset-fix
rrchai commented 2 years ago
mialy-defelice commented 2 years ago

@rrchai Sounds good, I do think the validation rules will solve this issue. They need to be tested though and then added to DC. There is this PR Schematic #510 that should be helpful (but only in gsheets).