OHDSI / OncologyWG

Oncology Working Group Repository
https://ohdsi.github.io/OncologyWG
Apache License 2.0
53 stars 24 forks source link

Identify gaps in diagnostic histology/morphology hierarchy (SNOMED, ICDO3) #601

Open kzollove opened 9 months ago

odikia commented 8 months ago

@kzollove , Peter Prinsen and I are working on this! If you/Tufts/Minderoo are also working on this, please let us know so that we can combine our efforts! Peter has a wonderful and substantial breakdown on the gaps prepared, for example. Meeting about next steps 10/26/2023.

patrickthealba commented 8 months ago

I'd like to follow up on how you're identifying gaps. I haven't quite mapped our registry yet but I suspect we'll have a few gaps worth noting/identifying as well.

odikia commented 8 months ago

@patrickthealba, absolutely! I didn’t complete the original gap analysis, but I’ll see what we can do about posting code along with documentation.

while we’re on the subject of registries, how are you getting your data, or planning on getting your data? Direct push from registry software, or via flat file/xml? I ask as we locally have a high priority xml ingestion process, and we’re currently designing for NAACCR v21+. We’re pushing from XML into the OncologyWG’s EAV like intermediate table (naaccr_data_points) en route to OMOP tables. If you’re in a similar boat, and it’s an impediment currently to ingesting NAACCR data, let me know!

kzollove commented 8 months ago

@odikia

Peter Prinsen and I are working on this!

This is great news, thanks for self-assigning! I was just creating the issue, but I'd love to learn more about your process if you don't mind me lurking in your next meeting. It'd be great to see the gap analysis code.

Would love to learn how you're going from XML to naaccr_data_points... our system is based off of the old flat file format and we of course need to update it.

patrickthealba commented 8 months ago

This is good reason for me to figure out if I can get the data somewhere closer to 'source'; but at the moment there is a database table with a flat structure that contains all of our(VA) registry entries. It does look like most of the variables map to NAACCR Item numbers but I haven't had the chance to properly assess that or map it to the Registry ETL script just yet. For reference/posterity, more detail on the VA registry available for research found here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6874038/

odikia commented 8 months ago

@kzollove , sorry, just seeing this! I'm pretty sure you can come to future design meetings just to get a sense of the process. Peter is the code-smith in this case, and during our recent meeting we discussed hopefully sharing it with other members of the community so that they can pressure test themselves. Another thing discussed was my intention of adapting it to our local registry, so that I can provide metrics for where our registry fits and doesn't fit the overall observations made by Peter, and furthermore, to what extent Winship is mapping to problematic concepts (whether deprecated or classified as an "unlikely" combination)

odikia commented 8 months ago

@patrickthealba , thanks so much for sharing! Though we've worked with the VA in a variety of areas at Emory/Winship, we haven't discussed their tumor registry data. This is very helpful background knowledge!

@mgurley is working on contacting registry vendors that utilize NAACCR format, to see if they'd be willing to work on the mapping from NAACCR to OMOP. I'm wondering if similarly, there are individuals maintaining the VACCR at the VA that would be interested in chatting with members of the OHDSI community about this. May be a vocabulary that should be added to the vocabularies sometime in the future, via a community contribution!

odikia commented 8 months ago

Hey @kzollove ,

Closing the loop on the below:

I'd love to learn more about your process if you don't mind me lurking in your next meeting.

Meeting invite forwarded! @peterprinsen-iknl , FYI