beamandrew / cui2vec

Other
51 stars 7 forks source link

co-occurence matrix #4

Open KrishnaPG opened 3 years ago

KrishnaPG commented 3 years ago

In the section 2.4 of the paper,

a CUI-CUI co-occurrence matrix is constructed, ... For nonclinical text data (e.g., journal articles), it is first preprocessed (see Section 3) and chunked into fixed length windows of 10 words, and a co-occurrence is counted as the appearance of a CUI-CUI pair in the same window. For claims data, ICD-9 codes are mapped to UMLS CUIs and a co-occurrence is counted as the number of patients in which two CUIs appear in any 30-day period. Finally, for the clinical notes, we counted a co-occurrence as two CUIs appearing in the same 30-day ‘bin’

The co-occurance matrix created on these 3 separate sources - would you be able to kindly provide access to it? It is very powerful data-structure and can lead to further investigations (we already hold UMLS license, and if required can reach out to you privately to get the download access, if it cannot be publicly released).

Thank you

GregSilverman commented 2 years ago

I'd like to know the structure off the co-occurrence matrix, if at all possible.

GregSilverman commented 2 years ago

Never mind, I see all data are in the vignettes folder.