compbio-UofT / medsavant

MedSavant is a search engine for genetic variants
22 stars 9 forks source link

Character encoding in annotation databases #291

Open ronammar opened 10 years ago

ronammar commented 10 years ago

Clinvar has character encodings such that

"Nephrotic syndrome\x2c type 3" Should become "Nephrotic syndrome, type 3"

Previous versions of the Clinvar annotation DB contained the characters "\x2c" for ",", but the current version is missing the "\" and just says "x2c", so cannot be processed. Would be good to revert back to the previous Clinvar DB annotation or modify the character encoding.