Sage-Bionetworks / schematic

Package for biomedical data model and metadata ingress management
https://schematicpy.readthedocs.io/en/stable/cli_reference.html
MIT License
22 stars 25 forks source link

JSONSchema validation errors are not showing when submitting manifest #867

Closed linglp closed 2 years ago

linglp commented 2 years ago

Describe the bug If you convert a CSV data model with "list like" as one of the validation rules, the JSON-LD file that it generated seems to not being able to list errors when later used for manifest submission.

My guess is that there's something wrong with the json-ld generated by schematic if the data model csv contained "list like"

To Reproduce Steps to reproduce the behavior:

  1. Go to develop
  2. Run the following command: schematic schema convert /Users/lpeng/Documents/schematic-git2/schematic/tests/data/CSBCDataModel_new.csv
  3. Update config.yml so that it points to the CSBCDataModel_new.jsonld
  4. Try submit the data model by doing: schematic model -c config.yml submit -mp /Users/lpeng/Documents/test-schematic/CA184897.csv -d syn33691763 -vc DatasetView -mrt table
  5. see error :
    
    (schematicpy-9OomxyhV-py3.9) lpeng@w153 schematic % schematic model -c config.yml submit -mp /Users/lpeng/Documents/test-schematic/CA184897.csv -d syn33691763 -vc DatasetView -mrt table
    Starting schematic...
    The (model > input > location) argument with value 'tests/data/CSBCDataModel_new.jsonld' is being read from the config file.
    The (model > input > file_type) argument with value 'local' is being read from the config file.

UPGRADE AVAILABLE

A more recent version of the Synapse Client (2.6.0) is available. Your version (2.5.1) can be upgraded by typing: pip install --upgrade synapseclient

Python Synapse Client version 2.6.0 release notes

https://python-docs.synapse.org/build/html/news.html

[####################]100.00% 1/1 Done... Downloading [####################]100.00% 23.5MB/23.5MB (8.3MB/s) SYNAPSE_TABLE_QUERY_101701260.csv.synapse_download_101701260 Done... JSON schema successfully generated from schema.org schema! JSON schema file log stored as tests/data/CSBCDataModel_new.DatasetView.schema.json /Users/lpeng/Library/Caches/pypoetry/virtualenvs/schematicpy-9OomxyhV-py3.9/lib/python3.9/site-packages/jinja2/environment.py:1088: DeprecationWarning: 'soft_unicode' has been renamed to 'soft_str'. The old name will be removed in MarkupSafe 2.1. return concat(self.root_render_func(self.new_context(vars))) 0 expectation(s) included in expectation_suite. Calculating Metrics: 0it [00:00, ?it/s] error: Validation errors resulted while validating with 'DatasetView'.



But if you turn "list like" to "list", you could see a list of validation error when running the submit command. So the following works: 
1. Go to `develop`
3.  Run the following command: `schematic schema convert /Users/lpeng/Documents/schematic-git2/schematic/tests/data/CSBCDataModel_old.csv`
4. Update `config.yml` so that it points to the `CSBCDataModel_old.jsonld`
7. Try submit the data model by doing: `schematic model -c config.yml submit -mp /Users/lpeng/Documents/test-schematic/CA184897.csv -d syn33691763 -vc DatasetView -mrt table`
8. see a list of validation errors 

**Expected behavior**
A clear and concise description of what you expected to happen.

**Priority** (select one)
- [ ] Minor ⬇️
- [X] Major 📢
- [ ] Critical 🆘

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Desktop (if applicable, please complete the following information):**
 - OS: [e.g. iOS]
 - Browser [e.g. chrome, safari]
 - Version [e.g. 22]

**Additional context**
linglp commented 2 years ago

CSBCDataModel_new.csv CSBCDataModel_old.csv CA184897.csv

linglp commented 2 years ago

@brynnz22 for awareness

linglp commented 2 years ago

@GiaJordan Let me know if you have time to take a look. Thank you!!

linglp commented 2 years ago

The original issue is here: https://github.com/Sage-Bionetworks/schematic/issues/845

linglp commented 2 years ago

After comparing the error log generated by using CSBCDataModel_old.jsonld and CSBCDataModel_new.jsonld, I think the problem might be that the following error did not get printed in the console.

[[2, 0, "'' is not one of ['Aorta', 'Endothelium', 'Tendon', 'Head and Neck', 'Bladder', 'Intestine', 'Epithelium', 'Ovary', 'Olfactory Mucosa', 'Uterus', 'Kidney', 'Periodontal Ligament', 'Hippocampus', 'Gastrointestinal Tract', 'Not Applicable', 'Vagina', 'Intrathoracic Lymph Nodes', 'Pending Annotation', 'Synovial Membrane', 'Fallopian Tube', 'Gastroesophageal Junction', 'Cornea', 'Thymus', 'Stomach', 'Respiratory System', 'Brain', 'Esophagus', 'Placenta', 'Hematopoietic System', 'Peripheral Nerves', ", ''], [3, 0, "'' is not one of ['Aorta', 'Endothelium', 'Tendon', 'Head and Neck', 'Bladder', 'Intestine', 'Epithelium', 'Ovary', 'Olfactory Mucosa', 'Uterus', 'Kidney', 'Periodontal Ligament', 'Hippocampus', 'Gastrointestinal Tract', 'Not Applicable', 'Vagina', 'Intrathoracic Lymph Nodes', 'Pending Annotation', 'Synovial Membrane', 'Fallopian Tube', 'Gastroesophageal Junction', 'Cornea', 'Thymus', 'Stomach', 'Respiratory System', 'Brain', 'Esophagus', 'Placenta', 'Hematopoietic System', 'Peripheral Nerves', ", '']]
GiaJordan commented 2 years ago

Summary of convo with @linglp: JSONSchema validation errors are not and have not been displayed as error messages in the same way as validation rules. Options: add methods to GenerateError class to format and display valid values errors, or for users to run the manifest validation apart from submission to see full list of errors. Most likely don't want to display full list of errors at submission to avoid overcrowding output

linglp commented 2 years ago

@brynnz22 I know that we also discussed this issue. Please also refer to Gianna's comment above. After looking more into the code, we found out that JSONSchema validation errors (i.e. when you didn't fill out "Dataset Tissue" which is a required column in JSON Schema) have not been displayed in the same way as violating validation rules. I have a PR in the process to change that, but in the meantime, you could use the validate command: schematic model -c config.yml validate -mp /Users/lpeng/Documents/test-schematic/CA184897.csv -dt DatasetView, and see full error messages.

brynnz22 commented 2 years ago

@linglp I am not exactly sure what your and @GiaJordan conversation was, so I am a little lost on these comments. If I use the validation command schematic model -c config.yml validate -mp /Users/lpeng/Documents/test-schematic/CA184897.csv -dt DatasetView I get a very long output saying everything is invalid when testing on a manifest because most of the attributes are "list like" and Schematic is not recognizing single value lists. Even if I fill in the Dataset Tissue column with correct values. Let me know what you mean for me to do as I don't understand.

linglp commented 2 years ago

@brynnz22 You shouldn't be seeing a long list of error messages if you have re-generated your JSON-LD after updating the old data model (and updated from list to list like). I will send you the JSON-LD and manifest that I used over slack.

brynnz22 commented 2 years ago
Screen Shot 2022-08-22 at 2 14 22 PM

Here is the validation errors I get, which appear to be a bug - e.g. Mouse is a valid value. I used the same jsonld and manifest attached. Tagging @vpchung

linglp commented 2 years ago

Note: The issue that Brynn mentioned about validation is an issue for the released version 22.7.1, and the develop branch is working as expected.

vpchung commented 2 years ago

Hi @linglp - thanks for the quick update.

Because we're on a tight timeline to get a new manifest up, would you recommend we switch to the develop branch to validate, seeing that that is working as expected?

linglp commented 2 years ago

Hi @vpchung ! Yes, you could use the develop branch (To use the develop branch, please refer to the main ReadMe documentation here). In the meantime, we are trying to do a release this week (but we would still need some time to review the backlogs and also review all the PRs that we plan to include).

vpchung commented 2 years ago

Sounds great, thanks @linglp ! We really appreciate your team's effort on this - looking forward to the release! We'll try using the develop branch in the mean time.