edonnachie / ICD10gm

R Package: ICD-10-GM Metadata
https://edonnachie.github.io/ICD10gm/
Other
10 stars 2 forks source link

Icd 10 #1

Closed jackwasey closed 6 years ago

jackwasey commented 8 years ago

Hi, didn't see your email so contacting you this way. You may hav seen the icd9 package for R which I wrote. You may not have seen the almost complete icd10 branch, which is in github and soon CRAN. I'm not quite sure of the overlap of our work, but we should see if we can collaborate. Where are your ICD lists from? The icd 10 WHO version has copyright restrictions, believe it or not. I used the public domain USA icd-10-cm, and was considering whether to scrape the web as needed for ICD10 from WHO. I rewrote the icd9 package so different icd versions could be added fairly easily.

edonnachie commented 8 years ago

Hi Jack,

Many thanks for your e-mail. I was aware of the icd9 package, but not the icd10 branch.

I'm based in Munich and work within the German health system, which has it's own variant of the ICD-10 system (ICD 10 German Modification, updated each year by DIMDI, a German federal agency). The data is also under copyright, but I believe that the restrictions enable usage of historic data in such a package. The data are released for public download six months into the year, forcing the commercial users to purchase from DIMDI.

The original purpose of my package was to process the metadata in order to create a history with which to query a database of claims data (function icd_history). It also auto-expands codes, such that, for example, "A0" will expand to all 5-digit codes betwen A01 and A09 and "J" will expand to all codes in the J-chapter (function icd_expand). I'll try to add some proper documentation to github soon.

More generally speaking, I'm interested in developing ways to "understand" ICD diagnosis data. It may be an idea to incorporate some of these ideas into your package, or create a new, "variant neutral" package. For example:

1) Measures of the "distance" between two codes (e.g. E11.2 is close to E11.9). More challenging (the relevant grouper data are often propriety) is to recognise that, for example, R52.2 and F45.4 both refer to chronic pain.

2) Visualisation and summary of ICD codes. I'm thinking along the lines of the "calendar heat map", only for diagnoses (http://blog.revolutionanalytics.com/2009/11/charting-time-series-as-calendar-heat-maps-in-r.html)

I'll take a closer look at your package. It would certainly be work thinking about where collaboration would be beneficial.

Best regards,

Ewan Donnachie

On 2016-03-03 15:23, Jack Wasey wrote:

Hi, didn't see your email so contacting you this way. You may hav seen the icd9 package for R which I wrote. You may not have seen the almost complete icd10 branch, which is in github and soon CRAN. I'm not quite sure of the overlap of our work, but we should see if we can collaborate. Where are your ICD lists from? The icd 10 WHO version has copyright restrictions, believe it or not. I used the public domain USA icd-10-cm, and was considering whether to scrape the web as needed for ICD10 from WHO. I rewrote the icd9 package so different icd versions could be added fairly easily.

Reply to this email directly or view it on GitHub [1].

Links:

[1] https://github.com/edonnachie/ICD/issues/1

jackwasey commented 8 years ago

Hi, and thanks for your detailed reply.

I think there is good scope for bringing our work together.

  1. I'll have to look more into what you mean by ICD history.
  2. My package finds 'children' (defined or just syntactically valid) ICD-9 codes. When I started doing this for ICD-10 codes, I found that the number of possibilities for ICD-10-CM was very large. I then found that WHO ICD-10 may have license problems. At that point, I paused, but if the data is available, it would be trivial to add German or international ICD-10 capability to the icd_children function.
  3. Distance is an interesting question. From my point of view, mostly considering how ICD codes fall into standardized comorbidities, knowing that a 'close' ICD code exists which did not fall into a comorbidity would be interesting for assessing sensitivity of the comorbidity maps, and for finding possible problems or improvements in them. The standard lists are carefully curated, but each year the codes are updated, and the standardized lists are not. This is an opportunity to get the better comorbidity data, and characterise in-hospital complications from patient records.

My package doesn't currently implement any rules, e.g. does this 80 year old man have a female infant diagnostic code. The only rules it currently implements are those related to comorbidities, e.g. should Hypertension be counted if Hypertension with complications also exists. Again, I'm focussed on getting accurate comorbidities

  1. Visualization: interesting. I did briefly look at doing some kind of 2d heat map of comorbidities and patients, but I generally deal with thousands to millions of patients, and I didn't come up with an easy useful solution. Would be interested to see what you come up with.

I'm very close to releasing my ICD9+10 package to CRAN. It's a substantial amount of code, but still has some small gaps in the ICD-10 space (e.g. icd_children). The icd10 branch is the work in progress, which I'm going to merge into master soon. Once in master, I'm going to be very careful to commit only code that completes R CMD check successfully. You may have found the icd10 branch earlier this week didn't always install.

I know you've invested some time in making a package yourself, but would you consider integrating your work into the icd package (with full attribution, of course)? And please, don't hesitate to file github issues if you think anything is fishy.

Best wishes, Jack