Sage-Bionetworks / schematic

Package for biomedical data model and metadata ingress management
https://schematicpy.readthedocs.io/en/stable/cli_reference.html
MIT License
22 stars 25 forks source link

Table feature error #845

Closed brynnz22 closed 2 years ago

brynnz22 commented 2 years ago

Describe the bug When attempting to upload a manifest as a table, the following error occurs error: Component 'None' is not present in '/Users/bzalmanek/Documents/csbc_schematic/schematic/tests/data/csbc.model.jsonld', or is invalid.Attached are the data model and an example manifest.

To Reproduce Steps to reproduce the behavior:

  1. Use the command to upload a manifest as a table schematic model -c /Users/bzalmanek/Documents/csbc_schematic/schematic/config.yml submit -mp /Users/bzalmanek/Desktop/split_tables-MAY_INCLUDED/dataset/CA184897.csv -d syn33691763 -mrt table with own values for the config.yml file, manifest path, and dataset id.
  2. See issue

Expected behavior A table will upload without the error.

Priority (select one)

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (if applicable, please complete the following information):

Additional context CSBC Data Model CA184897.csv

mialy-defelice commented 2 years ago

@brynnz22 Can you upload your config.yml file too?

brynnz22 commented 2 years ago

yml files are not supported here, so I can't upload it. I've attached a screen shot:

Screen Shot 2022-08-08 at 9 10 16 AM
mialy-defelice commented 2 years ago

Hey @brynnz22 , I have been trying to replicate your error but am not experiencing any issues with uploading the manifest as a table to synapse. Here is where the table uploaded in my scratch space. Here is where the CSV loaded.

Can you remake your JSONLD then try uploading the table again? If that does not work, there may be some sort of issue if you are re-uploading a table, if there was a change in between uploads to the columns. I was able to keep uploading to the same table, however, with no error.

I personally pulled your JSONLD from your CSBC GitHub, as well as made it from the model your provided and both worked.

I also noticed some things in your .yml that are not currently causing errors but might in the future.

brynnz22 commented 2 years ago

@mialy-defelice thanks for looking into it. I remade the JSONLD. I also installed the latest version of Schematic and I am continuing to get the same error. I am not trying to re-upload a table after making a change to the manifest. I am just trying to do the initial upload. The JSONLD from the CSBC Github is not up to date. Could you try downloading it from this google sheet and let me know if you are able to replicate my error? I attached a screenshot of the error message:

Screen Shot 2022-08-11 at 10 51 44 AM

Thanks for the info about the yaml file. I will update it.

mialy-defelice commented 2 years ago

Could you try downloading it from this google sheet and let me know if you are able to replicate my error?

I do not have access to ^this google sheet.

I did remake the JSONLD from the CSV model you originally provided. Is this one different? I added the GitHub one as an extra test to see if there was a difference.

brynnz22 commented 2 years ago

That's weird. It says you have access. Maybe I linked something different. Here is the correct link: https://docs.google.com/spreadsheets/d/1hbG1vdi8a0gc-6psn5VgsBgyMl6IOhDSbHWDuC_8EXo/edit#gid=59750547

Oh yes that is the same csv as the google sheet.

mialy-defelice commented 2 years ago

Okay... can you send me your JSONLD via Slack? If that does not get me to the error maybe I will try to see if we can get someone else to try to reproduce it.

mialy-defelice commented 2 years ago

@linglp can you try to reproduce the error that @brynnz22 is experiencing? I have been testing it, but have not had any issues uploading the manifest table to Synapse.

linglp commented 2 years ago

@brynnz22 and @mialy-defelice Yes I could reproduce the error by running schematic model -c config.yml submit -mp tests/data/mock_manifests/CA184897.csv -d syn33691763 -mrt table

But I think the error could be resolved if you add the -vc flag. Try something like this: schematic % schematic model -c config.yml submit -mp tests/data/mock_manifests/CA184897.csv -d syn33691763 -vc DatasetView -mrt table

From my end, I could see a list of error messages:

error: For attribute Dataset Assay in row 2 it does not appear as if you provided a comma delimited string. Please check your entry ('RNA Sequencing'') and try again.
error: For attribute Dataset Assay in row 3 it does not appear as if you provided a comma delimited string. Please check your entry ('RNA Sequencing'') and try again.
error: For attribute Dataset Species in row 2 it does not appear as if you provided a comma delimited string. Please check your entry ('Mouse'') and try again.
error: For attribute Dataset Tumor Type in row 2 it does not appear as if you provided a comma delimited string. Please check your entry ('Neuroblastoma'') and try again.
error: For attribute Dataset Tumor Type in row 3 it does not appear as if you provided a comma delimited string. Please check your entry ('Breast Carcinoma'') and try again.
error: For attribute Dataset Consortium Name in row 2 it does not appear as if you provided a comma delimited string. Please check your entry ('ICBP'') and try again.
error: For attribute Dataset Consortium Name in row 3 it does not appear as if you provided a comma delimited string. Please check your entry ('ICBP'') and try again.
error: For attribute Dataset Grant Number in row 2 it does not appear as if you provided a comma delimited string. Please check your entry ('CA184897'') and try again.
error: For attribute Dataset Grant Number in row 3 it does not appear as if you provided a comma delimited string. Please check your entry ('CA184897'') and try again.
error: For attribute Dataset Pubmed Id in row 2 it does not appear as if you provided a comma delimited string. Please check your entry ('26907613'') and try again.
error: For attribute Dataset Pubmed Id in row 3 it does not appear as if you provided a comma delimited string. Please check your entry ('30755444'') and try again.
error: For attribute Dataset Tissue in row 2 it does not appear as if you provided a comma delimited string. Please check your entry (''') and try again.
error: For attribute Dataset Tissue in row 3 it does not appear as if you provided a comma delimited string. Please check your entry (''') and try again.
error: Validation errors resulted while validating with 'DatasetView'.
linglp commented 2 years ago

@brynnz22 more explanation of the error message: "error: Component 'None' is not present in '~/CSBCDataModel.jsonld', or is invalid."

schematic was trying to find the component that you are validating (because validation also gets triggered when you are submitting manifests), and since you didn't specify -vc flag, schematic thinks the component is "None" (and it makes sense that "None" is not present in the data model)

brynnz22 commented 2 years ago

Thank you @linglp this is super helpful. I will use the -vc flag in my command. I'm also wondering about the specific error messages you listed above.

error: For attribute Dataset Assay in row 2 it does not appear as if you provided a comma delimited string. Please check your entry ('RNA Sequencing'') and try again.
error: For attribute Dataset Assay in row 3 it does not appear as if you provided a comma delimited string. Please check your entry ('RNA Sequencing'') and try again.
error: For attribute Dataset Species in row 2 it does not appear as if you provided a comma delimited string. Please check your entry ('Mouse'') and try again.
error: For attribute Dataset Tumor Type in row 2 it does not appear as if you provided a comma delimited string. Please check your entry ('Neuroblastoma'') and try again.
error: For attribute Dataset Tumor Type in row 3 it does not appear as if you provided a comma delimited string. Please check your entry ('Breast Carcinoma'') and try again.
error: For attribute Dataset Consortium Name in row 2 it does not appear as if you provided a comma delimited string. Please check your entry ('ICBP'') and try again.
error: For attribute Dataset Consortium Name in row 3 it does not appear as if you provided a comma delimited string. Please check your entry ('ICBP'') and try again.
error: For attribute Dataset Grant Number in row 2 it does not appear as if you provided a comma delimited string. Please check your entry ('CA184897'') and try again.
error: For attribute Dataset Grant Number in row 3 it does not appear as if you provided a comma delimited string. Please check your entry ('CA184897'') and try again.
error: For attribute Dataset Pubmed Id in row 2 it does not appear as if you provided a comma delimited string. Please check your entry ('26907613'') and try again.
error: For attribute Dataset Pubmed Id in row 3 it does not appear as if you provided a comma delimited string. Please check your entry ('30755444'') and try again.
error: For attribute Dataset Tissue in row 2 it does not appear as if you provided a comma delimited string. Please check your entry (''') and try again.
error: For attribute Dataset Tissue in row 3 it does not appear as if you provided a comma delimited string. Please check your entry (''') and try again.
error: Validation errors resulted while validating with 'DatasetView'.

Do you know why it is giving these errors? It appears that it thinks there are empty strings after the entries, but from my end, there do not appear to be any. Basically, it is giving an error for all of the attributes in the manifest that are lists.

linglp commented 2 years ago

@brynnz22 for example, for the first line of error message, it is saying that column "Dataset Assay" should accept a list. (I also double checked the CSV version data model and found out that the validation rule of "Dataset Assay" is indeed list. )

But your entry "RNA Sequencing" in the manifest is not a comma separate list.

For the last line of error message, it appears that you did not input a value for column "Dataset Tissue". But based on the data model, this is a required column and accepts "list".

brynnz22 commented 2 years ago

@linglp yes dataset assay along with the other attributes should accept lists, but many will just have one entry. Can Schematic accept lists with only one item in them? This is pretty important.

You are correct about the tissue attribute, I will have to fix the manifests to list something for the empty values.

linglp commented 2 years ago

@brynnz22 If you could change the list in the data model to list like, then schematic would allow single entry (without it being a comma separated list). list by default is treated as list strict which means that you have to enter a comma separated list.

brynnz22 commented 2 years ago

That's great to know. Thank you!

brynnz22 commented 2 years ago

@linglp can we reopen this issue? I tried to use your -vc solution and it is now giving this error: error: Component 'DatasetView' is not present in '/Users/bzalmanek/Documents/csbc_schematic/schematic/tests/data/csbc.model.jsonld', or is invalid. Would it be possible to meet over a zoom call to go over this? I am not sure if I have Schematic installed incorrectly or if it is something else.

Screen Shot 2022-08-16 at 3 40 50 PM
linglp commented 2 years ago

@brynnz22

Hi Brynn! I tried schematic model -c config.yml submit -mp tests/data/mock_manifests/CA184897.csv -d syn33691763 -vc DatasetView -mrt table again, and I can't reproduce your error.

But I would recommend the following:

And yes, if you still get the error, we could set up a zoom call

brynnz22 commented 2 years ago

@linglp yes, I have followed those exact steps many times and I continue to get the error. My json ld looks exactly like that

Screen Shot 2022-08-17 at 8 55 32 AM

I will see if Victor I am working with can get it to work on his computer.

linglp commented 2 years ago

@brynnz22 thanks. Let me know if you figure it out. If not, we could schedule a zoom call. We released a new version of schematic in July this year, so if you are using that version, it should be fine too.

brynnz22 commented 2 years ago

@linglp yes, can we please meet? Victor is getting a completely different error.

linglp commented 2 years ago

@brynnz22 feel free to message me on slack. I also tried to pip install schematic again (the version that I installed here is 22.7.1. which is the latest version that we released in July) , and I still can't reproduce your error

linglp commented 2 years ago

This issue happened because the JSON-LD data model was generated by using an older version of schematic. The issue could be resolved by installing the latest version of schematicpy from PYPI. @brynnz22 Let me know if you have other questions.