eliteportal / data-models

data models for the elite project
https://eliteportal.github.io/data-models/
MIT License
1 stars 1 forks source link

updated bsSeq descriptions for Unknown Not collected Not applicable Not specified #39

Closed ameliakallaher closed 2 months ago

ameliakallaher commented 2 months ago

Added updated descriptions for unknown, not collected, not applicable, not specified where missing.

ameliakallaher commented 2 months ago

@avanlinden, I opened a new PR for the remaining missing definitions of the bsSeq manifest.

avanlinden commented 2 months ago

Running the join_data_model.py script successfully joins the full csv from the modules but fails to convert to json-ld, and swallows the schematic error.

Running schematic schema convert separately gives the following error:

schematic schema convert EL.data.model.csv
Starting schematic...
Parsing data model.
Traceback (most recent call last):
  File "/Users/alinden/Library/Caches/pypoetry/virtualenvs/elite-data-models-7WrDxpD_-py3.10/bin/schematic", line 8, in <module>
    sys.exit(main())
  File "/Users/alinden/Library/Caches/pypoetry/virtualenvs/elite-data-models-7WrDxpD_-py3.10/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/alinden/Library/Caches/pypoetry/virtualenvs/elite-data-models-7WrDxpD_-py3.10/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/alinden/Library/Caches/pypoetry/virtualenvs/elite-data-models-7WrDxpD_-py3.10/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/alinden/Library/Caches/pypoetry/virtualenvs/elite-data-models-7WrDxpD_-py3.10/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/alinden/Library/Caches/pypoetry/virtualenvs/elite-data-models-7WrDxpD_-py3.10/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/alinden/Library/Caches/pypoetry/virtualenvs/elite-data-models-7WrDxpD_-py3.10/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/alinden/Library/Caches/pypoetry/virtualenvs/elite-data-models-7WrDxpD_-py3.10/lib/python3.10/site-packages/schematic/schemas/commands.py", line 88, in convert
    data_model_grapher = DataModelGraph(parsed_data_model, data_model_labels)
  File "/Users/alinden/Library/Caches/pypoetry/virtualenvs/elite-data-models-7WrDxpD_-py3.10/lib/python3.10/site-packages/schematic/schemas/data_model_graph.py", line 90, in __init__
    self.graph = self.generate_data_model_graph()
  File "/Users/alinden/.pyenv/versions/3.10.9/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/Users/alinden/Library/Caches/pypoetry/virtualenvs/elite-data-models-7WrDxpD_-py3.10/lib/python3.10/site-packages/schematic/schemas/data_model_graph.py", line 117, in generate_data_model_graph
    node_dict = self.dmn.generate_node_dict(
  File "/Users/alinden/Library/Caches/pypoetry/virtualenvs/elite-data-models-7WrDxpD_-py3.10/lib/python3.10/site-packages/schematic/schemas/data_model_nodes.py", line 249, in generate_node_dict
    node_display_name = node_display_name.strip()
AttributeError: 'float' object has no attribute 'strip'

I think one of the attribute names is being misrecognized as a float, rather than a string?

avanlinden commented 2 months ago

Oh, right, it's line 147 in the EL.data.model.csv -- there's no attribute name in the first column, just an NA. I don't know what happened here but it's on the main branch too, not just a result of the joining process 😬

avanlinden commented 2 months ago

The error was originally introduced by running the join data model script somehow.

  1. I verified that totalReads.csv is formatted correctly
  2. I manually deleted the truncated row from the full csv data model
  3. I re-ran the join_data_model.py script again and it re-introduced the truncated row for totalReads, in addition to some whitespace before several other attribute names
  4. I spent some time debugging and realized that the data model was getting joined ok, but the csv was not being written correctly
  5. I tried deleting all the modules with the incomplete "file_annotation_template" attributes ported over from AD, since they did not contain any complicated logic and many conflicted with other existing attributes
  6. I then ran the script again and the csv was correctly written, and converted to the json ld
avanlinden commented 2 months ago

@ameliakallaher I think I got it working!

Here's an empty version of the updated bsSeq assay metadata template--can you take a look and let me know if it looks as expected to you? https://docs.google.com/spreadsheets/d/1QayhJO79tCZ0Lvt2nn-I_Q9omaDrboYFDr3PIvsRNSI/edit?gid=0#gid=0

I checked a few drop-downs and I do see "not applicable"/"not collected"/et al listed as options.

ameliakallaher commented 2 months ago

@avanlinden template looks correct to me, thanks!