Sage-Bionetworks / schematic

Package for biomedical data model and metadata ingress management
https://schematicpy.readthedocs.io/en/stable/cli_reference.html
MIT License
22 stars 25 forks source link

Error generating Google sheet #888

Closed clarisse-lau closed 2 years ago

clarisse-lau commented 2 years ago

Describe the bug The HTAN Duke center is unable to generate templates for their diagnosis dataset through the DCA (see ticket). Attempting to generate the google sheet using schematic's CLI yielded the error below.

To Reproduce Steps to reproduce the behavior: Run schematic manifest -v INFO --config config.yml get --data_type Diagnosis -d syn24615229 -oa --sheet_url using HTAN data model: https://github.com/ncihtan/data-models/blob/main/HTAN.model.jsonld

Expected behavior Obtain template csv and Google sheet URL for Duke Diagnosis dataset

Priority (select one)

Additional context Python 3.10.6 schematicpy 22.8.1

Traceback (most recent call last):
  File "/Users/clarisse/HTAN/schema-test/bin/schematic", line 8, in <module>
    sys.exit(main())
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/schematic/manifest/commands.py", line 189, in get_manifest
    result = create_single_manifest(data_type = dt, output_csv=output_csv, output_xlsx=output_xlsx)
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/schematic/manifest/commands.py", line 148, in create_single_manifest
    result = manifest_generator.get_manifest(
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/schematic/manifest/generator.py", line 1487, in get_manifest
    manifest_url = self.get_empty_manifest() 
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/schematic/manifest/generator.py", line 1250, in get_empty_manifest
    json_schema = self._get_json_schema(json_schema_filepath)
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/schematic/manifest/generator.py", line 348, in _get_json_schema
    json_schema = self.sg.get_json_schema_requirements(self.root, self.title)
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/schematic/schemas/generator.py", line 501, in get_json_schema_requirements
    node_range_d = self.get_nodes_display_names(node_range, mm_graph)
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/schematic/schemas/generator.py", line 354, in get_nodes_display_names
    node_list_display_names = [
  File "/Users/clarisse/HTAN/schema-test/lib/python3.10/site-packages/schematic/schemas/generator.py", line 355, in <listcomp>
    mm_graph.nodes[node]["displayName"] for node in node_list
KeyError: 'displayName'
milen-sage commented 2 years ago

Thanks for the reminder on this @clarisse-lau! Prioritizing now.

adamjtaylor commented 2 years ago

Confirming that I can replicate this behaviour in the HTAN DCA (staging). App disconnects during the generating link step for the Google Spreadsheet of the Duke Clinical Diagnosis template.

The error is not unique to Duke as it occurs also for DFCI's Diagnosis template. However it does seem to be specific for the Diagnosis component - I can't replicate with, for example Demographics.

Shiny logs as follows

2022-09-09T14:33:38.734650+00:00 shinyapps[5750996]: Welcome, Adam Taylor!
2022-09-09T14:33:38.734737+00:00 shinyapps[5750996]: https://python-docs.synapse.org/build/html/news.html
2022-09-09T14:33:38.734821+00:00 shinyapps[5750996]: INFO: [2022-09-09 14:33:38] synapseclient_default - Welcome, Adam Taylor!
2022-09-09T14:33:38.734565+00:00 shinyapps[5750996]: Python Synapse Client version 2.6.0 release notes
2022-09-09T14:33:38.734860+00:00 shinyapps[5750996]: 
2022-09-09T14:33:38.734698+00:00 shinyapps[5750996]: 
2022-09-09T14:44:19.736847+00:00 shinyapps[5750996]: Warning: Error in py_call_impl: KeyError: 'displayName'
2022-09-09T14:44:19.736892+00:00 shinyapps[5750996]: 
2022-09-09T14:44:19.736926+00:00 shinyapps[5750996]: Detailed traceback:
2022-09-09T14:44:19.736965+00:00 shinyapps[5750996]:   File "/srv/connect/apps/HTAN-data-curator-staging/.venv/lib/python3.8/site-packages/schematic/models/metadata.py", line 150, in getModelManifest
2022-09-09T14:44:19.737065+00:00 shinyapps[5750996]:     empty_manifest_url = self.get_empty_manifest()
2022-09-09T14:44:19.737038+00:00 shinyapps[5750996]:   File "/srv/connect/apps/HTAN-data-curator-staging/.venv/lib/python3.8/site-packages/schematic/manifest/generator.py", line 1469, in get_manifest
2022-09-09T14:44:19.736999+00:00 shinyapps[5750996]:     return mg.get_manifest(
2022-09-09T14:44:19.737116+00:00 shinyapps[5750996]:   File "/srv/connect/apps/HTAN-data-curator-staging/.venv/lib/python3.8/site-packages/schematic/manifest/generator.py", line 1250, in get_empty_manifest
2022-09-09T14:44:19.737156+00:00 shinyapps[5750996]:     json_schema = self._get_json_schema(json_schema_filepath)
2022-09-09T14:44:19.737202+00:00 shinyapps[5750996]:   File "/srv/connect/apps/HTAN-data-curator-staging/.venv/lib/python3.8/site-packages/schematic/manifest/generator.py", line 348, in _get_json_schema
2022-09-09T14:44:19.737243+00:00 shinyapps[5750996]:     json_schema = self.sg.get_json_schema_requirements(self.root, self.title)
2022-09-09T14:44:19.737285+00:00 shinyapps[5750996]:   File "/srv/connect/apps/HTAN-data-curator-staging/.venv/lib/python3.8/site-packages/schematic/schemas/generator.py [... truncated]
2022-09-09T14:44:19.737317+00:00 shinyapps[5750996]:   15: <Anonymous>
2022-09-09T14:44:19.737347+00:00 shinyapps[5750996]:   13: fn
2022-09-09T14:44:19.737382+00:00 shinyapps[5750996]:    8: retry
2022-09-09T14:44:19.737442+00:00 shinyapps[5750996]:    7: connect$retryingStartServer
2022-09-09T14:44:19.737483+00:00 shinyapps[5750996]:    6: eval
2022-09-09T14:44:19.737524+00:00 shinyapps[5750996]:    5: eval
2022-09-09T14:44:19.737561+00:00 shinyapps[5750996]:    4: eval
2022-09-09T14:44:19.737598+00:00 shinyapps[5750996]:    3: eval
2022-09-09T14:44:19.737673+00:00 shinyapps[5750996]:    1: local
GiaJordan commented 2 years ago

Partially addressed in #903, work will continue to determine cause of error and relation to base schema used @clarisse-lau @adamjtaylor

milen-sage commented 2 years ago

@elv-sb to update the HTAN data model's method of diagnosis field; it contains 'Imaging' as valid value, which collides with schema.org or biothings; schematic doesn't set attributes like "displayName" in this case, which causes the error above.

clarisse-lau commented 1 year ago

@milen-sage @elv-sb Just following up on this ticket- I'm still encountering the disconnect error when generating Duke's diagnosis sheet. Do you have an update on when the above will be implemented? Thanks!

milen-sage commented 1 year ago

@clarisse-lau the schema has been updated, but I am not sure if the production and staging apps have been updated to use the schema. Perhaps @elv-sb and @adamjtaylor can chime in.

adamjtaylor commented 1 year ago

@clarisse-lau Can confirm that this is fixed in current staging.

GiaJordan commented 1 year ago

Previous mention from other issue (#/999) was erroneous, and the two are not related