Open RohitChattopadhyay opened 5 years ago
Can you switch to extracting the year from:
<PubmedData>
<PubMedPubDate PubStatus="pubmed">
<Year>1947</Year>
Extracting the MedlineDate will mean parsing the date string. We want to avoid this.
The PR https://github.com/sorgerlab/indra/pull/902 solves the problem of inconsistency in Date.
<PubMedPubDate PubStatus="pubmed">
is absent in some records of file pubmed19n0972.xml
FTP link for the file: ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/pubmed19n0972.xml.gz
Related to #41
The above line is taken from PubMed XML Schema Defination.
The statement suggests that the XML files will have the publication date in \<MedlineDate> tag if parsing the date in the article is not possible.
Following Gist shows example XML files
Some examples of MedlineDate content: