Open kzollove opened 9 months ago
I'd like to follow up on how you're identifying gaps. I haven't quite mapped our registry yet but I suspect we'll have a few gaps worth noting/identifying as well.
@patrickthealba, absolutely! I didn’t complete the original gap analysis, but I’ll see what we can do about posting code along with documentation.
while we’re on the subject of registries, how are you getting your data, or planning on getting your data? Direct push from registry software, or via flat file/xml? I ask as we locally have a high priority xml ingestion process, and we’re currently designing for NAACCR v21+. We’re pushing from XML into the OncologyWG’s EAV like intermediate table (naaccr_data_points) en route to OMOP tables. If you’re in a similar boat, and it’s an impediment currently to ingesting NAACCR data, let me know!
@odikia
Peter Prinsen and I are working on this!
This is great news, thanks for self-assigning! I was just creating the issue, but I'd love to learn more about your process if you don't mind me lurking in your next meeting. It'd be great to see the gap analysis code.
Would love to learn how you're going from XML to naaccr_data_points... our system is based off of the old flat file format and we of course need to update it.
This is good reason for me to figure out if I can get the data somewhere closer to 'source'; but at the moment there is a database table with a flat structure that contains all of our(VA) registry entries. It does look like most of the variables map to NAACCR Item numbers but I haven't had the chance to properly assess that or map it to the Registry ETL script just yet. For reference/posterity, more detail on the VA registry available for research found here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6874038/
@kzollove , sorry, just seeing this! I'm pretty sure you can come to future design meetings just to get a sense of the process. Peter is the code-smith in this case, and during our recent meeting we discussed hopefully sharing it with other members of the community so that they can pressure test themselves. Another thing discussed was my intention of adapting it to our local registry, so that I can provide metrics for where our registry fits and doesn't fit the overall observations made by Peter, and furthermore, to what extent Winship is mapping to problematic concepts (whether deprecated or classified as an "unlikely" combination)
@patrickthealba , thanks so much for sharing! Though we've worked with the VA in a variety of areas at Emory/Winship, we haven't discussed their tumor registry data. This is very helpful background knowledge!
@mgurley is working on contacting registry vendors that utilize NAACCR format, to see if they'd be willing to work on the mapping from NAACCR to OMOP. I'm wondering if similarly, there are individuals maintaining the VACCR at the VA that would be interested in chatting with members of the OHDSI community about this. May be a vocabulary that should be added to the vocabularies sometime in the future, via a community contribution!
Hey @kzollove ,
Closing the loop on the below:
I'd love to learn more about your process if you don't mind me lurking in your next meeting.
Meeting invite forwarded! @peterprinsen-iknl , FYI
@kzollove , Peter Prinsen and I are working on this! If you/Tufts/Minderoo are also working on this, please let us know so that we can combine our efforts! Peter has a wonderful and substantial breakdown on the gaps prepared, for example. Meeting about next steps 10/26/2023.