OHDSI / MedlineXmlToDatabase

A command line Java application for parsing MEDLINE XML files and inserting the data into a relational database
Apache License 2.0
19 stars 11 forks source link

pmid version (discussion, not an issue) #10

Open vojtechhuser opened 6 years ago

vojtechhuser commented 6 years ago

PMID version can be pain in the neck Because of 0.001% of articles, all queries seem complex.

Do MeSH keywords get assigned only to latest version?

sql='SELECT pmid_version, count(*) FROM pmid_to_date group by 1' tta = dbGetQuery(conn = conn , sql) tta

   pmid_version    count
1             6        3
2             4        8
3             5        3
4             8        2
5             1 27836639
6             2      705
7             9        1
8             3       94
9             7        2
10           10        1
schuemie commented 6 years ago

Hi @vojtechhuser ! Yes, the PMID version seems very silly. Both the data and the MedlineXmlToDatabase script considers the combination of the PMID and the PMID_version to be the primary key to which all data is attached. So MeSH headings are assigned to a PMID + PMID_version in the XML, and therefore also in the database. It is perfectly possible for different MeSH headers to be assigned to different versions of the same PMID. I don't think NLM removes MeSH headings from old versions, and only keeps them for the latest version, but you could check (or ask your colleagues ;-) ).

Hope this helps.