Sage-Bionetworks / schematic

Package for biomedical data model and metadata ingress management
https://schematicpy.readthedocs.io/en/latest/cli_reference.html
MIT License
21 stars 24 forks source link

schematic: allow using labels as column headers in templates instead of display names #1123

Closed milen-sage closed 1 year ago

mialy-defelice commented 1 year ago

This issue is based on a request by @lopierra that began as a slack conversation, which can be viewed here.

in this example manifest, the column headers are the displayNames (e.g. Condition Source Text) that are more readable, but I would prefer to have the column headers be the labels (e.g. conditionSourceText) because they don't have spaces and are easier to work with programmatically.

User Story: As a data curator I would like the choice to generate the manifest tables column headers with schema labels instead display names for programatic ease of use.

AC:

Confirm the following: Use several models -- Example, HTAN, iAtlas

lopierra commented 1 year ago

To clarify the above, I am only able to use --use_schema_label with submit. I am not able to use get to generate a manifest with schema labels as columns headers, and if I manually create a CSV with schema labels as column headers, validate will throw an error. So I am forced to use display names in R, which is not ideal.

mialy-defelice commented 1 year ago

Hey @lopierra,

Can you provide me with your model and a manifest with entries? I would like to validate and submit to test that everything is working as you are intending.

Thanks!

lopierra commented 1 year ago

Hi @mialy-defelice - here is our model in YAML and JSON-LD.

I'm attaching a test manifest for the Study component.

An additional complication is that if I submit my own manifest with schemaLabels as headers, Synapse automatically capitalizes the first letter of each (so studyCode becomes StudyCode). Is there some way to suppress that, or do I need to open a ticket for Platform about this?

mialy-defelice commented 1 year ago

An additional complication is that if I submit my own manifest with schemaLabels as headers, Synapse automatically capitalizes the first letter of each (so studyCode becomes StudyCode). Is there some way to suppress that, or do I need to open a ticket for Platform about this?

@lopierra Are you submitting through the python or R client (using a table upload) or are you using Schematic?

lopierra commented 1 year ago

I am using schematic.

mialy-defelice commented 1 year ago

I am using schematic.

@lopierra Okay, this is intentional in schematic to conform to synapse standards. We convert all headers to PascalCase during the final submission step. Are you always using camelCase? You can open a ticket with us.

@milen-sage do they need to be Class Labels or can they be Property Labels?

lopierra commented 1 year ago

Thanks - I will open a ticket. We chose camelCase as the standard for slot names in our LinkML model and PascalCase for the classes, so it would be good to have the option for submitting camelCase headers to Synapse.

milen-sage commented 1 year ago

Based on conversation with @lopierra this is not an urgent blocker and we'll resolve after refactoring schema parsers #857