Open kmartinez834 opened 1 year ago
FYI @ubhuiyan and @katewarner ...
Not planning to address this for now, but be aware that some pmid txt files have errors.
I used the file /software/glygen/medline_dup_parser.py
to print a list of files that have errors. There are currently only 20 out of 294034. I opened each file and added those with duplicates to generated/misc/dup_pmid_mapping.csv
Not the most sophisticated solution, so if you decide to start mapping pmid's, you (or Robel) can could write a script to generate the dup_pmid_mapping file
Start using
generated/misc/dup_pmid_mapping.csv
when creating datasets.Some of the Medline files indicate that the PMID is a duplicate:
Issue continued from https://github.com/glygener/glygen-issues/issues/105