CTN-0094 / DOPE

Drug Ontology Parsing Engine
https://ctn-0094.github.io/DOPE/
Other
21 stars 1 forks source link

Clean Up "Synonyms" in the Lookup Table #48

Open gabrielodom opened 3 years ago

gabrielodom commented 3 years ago
  1. We have some issues with the free-text entries of lookup_df$synonym. For example, some drug names have " " at the end (e.g. 'blanca ", "monos ", "nieve "). I have noticed that most of these are Spanish words.
  2. There are symbols in some of the drug names, like "c & m", "m-cat", or "el perico ("parrot")". I think parse() removes these symbols, which means that these drug names will never be matched if we called parse() first (which is bad, because this is our recommended workflow).
  3. The string "mixed with" shows up 25 times. Can this formula of "drug a (mixed with drug b)" be re-expressed?
  4. The word "and" is a stop word, but " and " shows up in 40 times. We can't match to these drug synonyms either.
  5. There are 20 synonyms that include one or more ".", for example "l.a. ice" or "m.j.". We remove all periods in parse(), and lookup("m j") returns no matches.

@RaymondBalise, @labouz, what do you recommend we do here?