GanjinZero / ICD-MSMN

Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic ICD Coding [ACL 2022]
https://arxiv.org/abs/2203.01515
50 stars 8 forks source link

Where does preprocessing use match.py #9

Open mingyangligithub opened 1 year ago

mingyangligithub commented 1 year ago

I studied the code in preprocess folder. I could understand how description of the code and synonyms are combined. But I didn't find the result is used in the real training process. Because icd_dict{} generated in generate_data_new.ipynb isn't the one in match.py. Where is the result of match.py used? Or which file import match.py? Do I need to preprocess by myself according to the code in preprocess and then generate new data?

Thanks, Best regards.

GanjinZero commented 1 year ago

I think match.py only used to generate synonyms embedding/icd_mimic3_random_sort.json. I have provided it.

GanjinZero commented 1 year ago

You do not need to rerun it, unless you want to train on another dataset with different ICD codes that you need.

lynnolson commented 6 months ago

There appears to be another source for synonyms besides UMLS's MRCONSO.RRF file (version 2024AA). For example, running preprocess/match.py generates 5 synonyms for E870.9 (Accidental cut, puncture, perforation or hemorrhage during unspecified medical care):

['accidental cut, puncture, perforation or hemorrhage during medical care', 'accidental cut, puncture, perforation, or hemorrhage during medical care', 'accidental cut, puncture, perforation or hemorrhage during unspecified medical care', 'accidental cut, puncture, perforation or hemorrhage during medical care (navigational concept)', 'accidental cut, puncture, perforation or haemorrhage during medical care’]

But embedding/icd_mimic3_random_sort.json file has 26!

['accidental cut, puncture, perforation or haemorrhage during medical care, nos (disorder)', 'accidental cut, puncture, perforation or hemorrhage during medical care (navigational concept)', 'acc cut in med care', 'accidental cut, puncture, perforation, or hemorrhage during medical care', 'surg.accid.-medical care nos', 'accidental cut, puncture, perforation or hemorrhage during medical care, (finding)', 'accidental cut, puncture, perforation or hemorrhage during medical care (finding)', 'accidental cut, puncture, perforation or haemorrhage during medical care', 'accidental cut, puncture, perforation or hemorrhage during medical care, nos (finding)', 'accidental cut, puncture, perforation or hemorrhage during medical care,', "accidental cut, puncture, perforation ,h'ge medical care", 'surg.accid. medical care', 'accidental cut, puncture, perforation or hemorrhage during medical care, nos (navigational concept)', 'accidental cut, puncture, perforation or hemorrhage during medical care, (navigational concept)', 'accidental cut, puncture, perforation or haemorrhage during medical care,', 'accidental cut, puncture, perforation or hemorrhage during medical care', 'accidental cut, puncture, perforation or hemorrhage during medical care, nos', 'acc cut in med care nos', "accid.cut/punct/perf/h'ge-med.", "accid cut,puncture,perf,h'ge medical care", "accid cut,puncture,perf,h'ge - medical care nos", "accidental cut, puncture, perforation ,h'ge - medical care", 'accidental cut, puncture, perforation or haemorrhage during medical care, nos', "accid.cut/punct/perf/h'ge med.", 'accidental cut, puncture, perforation or hemorrhage during unspecified medical care', 'accidental cut, puncture, perforation or haemorrhage during medical care, (disorder)']

One clearly corresponds to the short title, but where do the other ones come from? For example, "accid.cut/punct/perf/h'ge med"?

GanjinZero commented 5 months ago

We use the UMLS 2020AA release