Sage-Bionetworks / schematic

Package for biomedical data model and metadata ingress management
https://schematicpy.readthedocs.io/en/stable/cli_reference.html
MIT License
22 stars 26 forks source link

Unexpected app crash upon validation (Error in bind_rows: Can't combine `..1$Column` <integer> and `..2$Column` <character>.) #740

Closed allaway closed 1 month ago

allaway commented 2 years ago

Describe the bug When I try to validate a manifest (attached here) the DCA app crashes unexpectedly with the error Error in bind_rows: Can't combine..1$Column<integer> and..2$Column<character>.

To Reproduce Steps to reproduce the behavior:

  1. Go to https://sagebio.shinyapps.io/NF_data_curator/
  2. Select ("Factors that ...." project) and CRISPR Raw Data folder.
  3. Go to validation, upload manifest attached above, and click validate.

Expected behavior This shouldn't crash the app.

Priority (select one)

Additional context This hasn't happened with other manifests on the same version of the app/schematic, and validation of this manifest appears to work fine in the same branch of schematic CLI.

(data_curator_env) ➜  testing schematic model --config ./schematic_config.yml validate -mp NF_Genomics_Assay_Reh.csv -dt GenomicsAssayTemplate 
Starting schematic...
The (model > input > location) argument with value 'NF.jsonld' is being read from the config file.
The (model > input > file_type) argument with value 'local' is being read from the config file.
JSON schema successfully generated from schema.org schema!
JSON schema file log stored as /Users/rallaway/Downloads/62ab417dc1b64041806f120bb6061b6c/schematic/log.json
/opt/miniconda3/envs/data_curator_env/lib/python3.9/site-packages/jinja2/environment.py:1088: DeprecationWarning: 'soft_unicode' has been renamed to 'soft_str'. The old name will be removed in MarkupSafe 2.1.
  return concat(self.root_render_func(self.new_context(vars)))
    0 expectation(s) included in expectation_suite.
Calculating Metrics: 0it [00:00, ?it/s]
[[2, 0, "'cultured Muller Glia' is not one of ['monocytes', 'Schwannoma', 'iPSC-derived neuron', 'iPSC', 'Teratoma', 'iPSC-derived neuronal progenitor cell', 'GABAergic neurons', 'round', 'iPSC-derived glia', 'epithelial', 'CD8+ T-Cells', 'NeuN-', 'lymphoblast', 'CD138+', 'Embryonic stem cells', 'monocyte-derived microglia', 'SH-SY5Y', 'Meningioma', 'B-lymphocytes', 'astrocytes', 'iPSC-derived telencephalic organoids', 'schwann', 'oligodendrocyte', 'CNON', 'microglia', 'epithelial-like', 'iPSC-derived as", 'cultured Muller Glia'], [2, 'isStrand....

Therefore, it's something (1) specific to this manifest and (2) probably has to do with how DCA handles the manifest before or after the schematic validation.

rrchai commented 2 years ago

It should be something doing with this chunk of codes: https://github.com/Sage-Bionetworks/data_curator/blob/fa9386a675007296bb3758dd0c78825d41277381/functions/validationResult.R#L64-L66

I think it is because the types of values are not consistent. If you try to add as.character, like Value = as.character(i[[4]][[1]]), it might solve the issue. Feel free to create a PR if that works. I can double check and address it in this repo tomorrow as well.

Thanks for reporting @allaway.

allaway commented 2 years ago

Thanks Rong! I will try that!

allaway commented 2 years ago

@rrchai, unfortunately, this doesn't appear to resolve the issue :(

allaway commented 2 years ago

I am transferring this issue to schematic. It appears to be a problem with list validation, not the DCA.

I did something similar to @rrchai suggestion. The initial error was: Error in bind_rows: Can't combine ..1$Column<integer> and..2$Column <character>. So, I wrapped Column in as.character:

https://github.com/nf-osi/NF_data_curator/blob/8eeff54142c36e894d2d5c364fa808cbff3e45ab/functions/validationResult.R#L68

Then, when I try to run the same manifest through, I get a different crash+Error:

2022-05-21T18:40:04.373158+00:00 shinyapps[5711947]: Warning: Error in name2int: You specified the columns: 0, but the column names of the data are  , Component, Filename, resourceType, progressReportNumber, dataType, assay, platform, individualID, parentSpecimenID, runType, libraryPrep, comments, age, ageUnit, aliquotID, cellType, dataSubtype, diagnosis, dissociationMethod, eTag, fileFormat, fundingAgency, initiative, isCellLine, isPrimaryCell, isStranded, libraryPreparationMethod, modelSystemName, nf1Genotype, nf2Genotype, nucleicAcidSource, organ, readDepth, readLength, readPair, readPairOrientation, readStrandOrigin, sex, species, specimenID, specimenPreparationMethod, studyId, studyName, tissue, tumorType, entityId

After a bit of testing I found out that if I remove the columns with list validation (cellType, modelSystemName), this error goes away.

ychae commented 2 years ago

@allaway is this error resolved by the fix to #710 ?