GanjinZero / ICD-MSMN

Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic ICD Coding [ACL 2022]
https://arxiv.org/abs/2203.01515
50 stars 8 forks source link

Missing method "clean" in load_umls.py ? #4

Closed drinkingxi closed 2 years ago

drinkingxi commented 2 years ago

I can't find the method called clean in class UMLS and it reports an error when initializing the class UMLS. In load_umls.py, clean_string = self.clean(string, clean_bracket=False) Besides, in match.py, icd_dict[icd] = [desc_dict[icd]] + umls.icd2str(icd) It calls the method icd2str, which is also missing in UMLS.

GanjinZero commented 2 years ago

I will add following functions into github repo later.

    def clean(self, term, lower=True, clean_NOS=True, clean_bracket=True, clean_dash=True):
        term = " " + term + " "
        if lower:
            term = term.lower()
        if clean_NOS:
            term = term.replace(" NOS ", " ").replace(" nos ", " ")
        if clean_bracket:
            term = re.sub(u"\\(.*?\\)", "", term)
        if clean_dash:
            term = term.replace("-", " ")
        term = " ".join([w for w in term.split() if w])
        return term
GanjinZero commented 2 years ago
    def icd2str(self, icd):
        if icd in self.code2cui:
            cui = self.code2cui[icd]
            str_list = self.cui2str[cui]
            str_list = [w for w in str_list if len(w.split()) >= 2 or len(w) >= 7]
            return list(str_list)
        return []