Sage-Bionetworks / schematic

Package for biomedical data model and metadata ingress management
https://schematicpy.readthedocs.io/en/latest/cli_reference.html
MIT License
21 stars 24 forks source link

Biothings namespace clashing #708

Open brynnz22 opened 2 years ago

brynnz22 commented 2 years ago

Describe the bug Many of the biothings namespaces clash with terms used in the CSBC data model, and other data models. For example the attribute, Publication Journal, in the CSBC data model has a valid value of Cell. Because this is a term in Biothings, schematic will give a displayName error and will not generate the manifest. I know this is already on @milen-sage radar.

To Reproduce Steps to reproduce the behavior:

  1. Generate a manifest with a valid value that has term in Biothings (like Cell or Protein)

Expected behavior A manifest would generate without errors.

Priority (select one)

Desktop (if applicable, please complete the following information):

anngvu commented 2 years ago

Related to this, I would request that the JSON-LD is not generated with the bts model included by default and that we have more control of defined namespaces, because we might register the model elsewhere at e.g. BioPortal or identifiers.org/.

ychae commented 2 years ago

@milen-sage could we schedule this one to be addressed in Sprint 12?

milen-sage commented 2 years ago

@ychae yes, please feel free to triage it.

Note: first step is improving docs. The base schema by default is biothings, but it's easy to configure schematic to use something else or not use a base schema. SH can include instructions on that; we should also add as an item in the data modeling workshop, perhaps in the things to consider section.

ychae commented 2 years ago

AC:

ychae commented 1 year ago

@milen-sage will set up meeting with @brynnz22

milen-sage commented 1 year ago

This is an easy fix if one is using the developer mode of schematic (to be covered in our workshop). However, we can make it easier.

Currently, a user can go to schematic/utils/io_utils.py and change

data_path = "data_models/biothings.model.jsonld"

to

data_path = "data_models/schema_org.model.jsonld"

This would just load schema.org. If a user has a simpler schema to start with (or no schema); this path can be adjusted further.

Note that the user should have their schematic developer mode setup, so that these changes are reflected in the CLI - either by rebuilding the schematic package with poetry; or, by using the poetry shell when making changes.

Instead of the current process, we should just provide a config option for the default schema to load; we can still use biothings as a default for now, but can also move to schema.org or a different schema in the future.

brynnz22 commented 1 year ago

@milen-sage sorry to come back to this issue. I changed the io.utils file as specified above, and I tried changing our data model back to using "Cell" as one of our valid values that clashed with biothings, and I am continuing to get a displayName error. Is it possible that "Cell" is clashing with Schema.org as well?

milen-sage commented 1 year ago

@brynnz22 could you post the updated json-ld schema? Thanks!

brynnz22 commented 1 year ago

@milen-sage I sent it to you via slack. jsonld is not supported here. The only differences are that I changed the Plain Text valid value for the Tool Input/Output Data attributes to Text. This impacts the generation for the ToolView and Tool manifests. I also changed the Cell (Journal) valid value for the Publication Journal attribute to Cell. This impacts the generation of the PublicationView and Publication manifests.

milen-sage commented 1 year ago

@GiaJordan

GiaJordan commented 1 year ago

Encountering issues when using the schema.org base schema even with our example data model and manifest. A KeyError exception is raised during the process of loading schema into network. It appears to be due to variations in the structure of the base schema.org and BioThings schemas. Next steps include a new method of unpacking the schema that works on both schemas, as well as other possible future ones as mentioned above. Related to #810 and #856

cc: @milen-sage

GiaJordan commented 1 year ago

After speaking with @milen-sage and using a newer version of the shema.org schema, I'm able to load it in as the base schema during manifest generation. When generating a Publication manifest as described above, the KeyError for a missing diaplayName key in a node dictionary still persists, so more investigation is necessary at this time

milen-sage commented 1 year ago

Hmm, I wonder if this is also related to #888. Let's keep investigating. A fix might resolve both 🤞

GiaJordan commented 1 year ago

Partially addressed in #903, work will continue to determine cause of error and relation to base schema used Feature request will be tracked here #856 @brynnz22 @anngvu

LanceCarlo commented 1 year ago

It's like getting in on fucking sunlight before there was fucking sunlight. u understand?

milen-sage commented 1 year ago

Looks like the internet is catching up. Spam message has been reported.

AmyHeiser commented 1 year ago

@milen-sage I know this has a long history and a lot of linked issues (several resolved). This is included on the MC2 plan so I am wondering if it's still an issue. I can add to Jira as a research to confirm if the issue is still relevent unless you already know.

milen-sage commented 1 year ago

@AmyHeiser yes this is still an issue - @mialy-defelice has captured it to address as part of the refactor (specifying custom base schema or no base schema on which data models extend).

AmyHeiser commented 1 year ago

Linking new Jira (combined issue) https://sagebionetworks.jira.com/browse/FDS-69