cancerDHC / example-data

This repository is intended to act as a store of example data files from across the NCI Cancer Research Data Commons (CRDC) nodes in a number of formats.
MIT License
0 stars 3 forks source link

Notebook for PDC to CRDCH transformation workflow #37

Closed sujaypatil96 closed 2 years ago

sujaypatil96 commented 2 years ago

Similar to the GDC to CCDH conversion notebook tutorial we have, this PR seeks to demonstrate the same with PDC data.

Note: The data in GDC and PDC is similar for the most part, so most of the snippets were replicated as is.

This PR is by no means complete. A few things that need to be added:

CC: @gaurav @turbomam @jooho-lee-kim. Just wanted to get your eyes on the Notebook is all.

turbomam commented 2 years ago

Thanks for starting this @sujaypatil96

You know I get confused easily, so let me know if I'm rehashing something we've already resolved!

Your pdc_to_crdch_transformation notebook imports the CRDC-H model with

from ccdh import ccdhmodel as ccdh

That gets an old version of the model from the cloned repo in the local filesystem.

In order to develop against the latest model as published on PyPI, I would suggest

import crdch_model.crdch_model as ccdh

In which case you will get the following from create_stage_from_pdc()

TypeError: crdch_model.crdch_model.CodeableConcept() argument after ** must be a mapping, not str

gaurav commented 2 years ago

That gets an old version of the model from the cloned repo in the local filesystem.

Yeah, it's a little odd that we still have that version rattling around! I'm working on a PR to switch this repo over to Poetry and to delete those files from this repo entirely (https://github.com/cancerDHC/example-data/pull/41).

sujaypatil96 commented 2 years ago

I think this PR is ready for review @turbomam and @gaurav. Would love to hear some thoughts from you guys as to how I can further improve this tutorial.

Perhaps more and better documentation? Better organization of code? etc.