OHDSI / MedlineXmlToDatabase

A command line Java application for parsing MEDLINE XML files and inserting the data into a relational database
Apache License 2.0
19 stars 11 forks source link

String vs number and publication year #7

Open vojtechhuser opened 7 years ago

vojtechhuser commented 7 years ago

I would like to get nice numerical date for each PMID. My colleague told me that the XML data can be messy and conversion to number may not be without errors.

1. are all fields imported to database as string by design? (I think so)

2. Did other users have experience with using medcit.datecreated_year as the publication year for a given article?

schuemie commented 7 years ago

Yes, dates are messy in the MEDLINE XML. This is why the publication date is automatically parsed (as best as possible) to fill the table pmid_to_date.

The parsing logic can be found here.