Open emanuil-tolev opened 8 years ago
@CecyMarden will check a few XML examples to see the (potentially 3) types of publication date. We'll have 3 columns if so.
I have looked into this. There are two dates we are interested in, and these are in the following two XML elements on Europe PMC:
<dateofPublication>
<electronicPublicationDate>
The column containing the first date should be titled Publication Date, and the column containing the second date should be titled Electronic Publication Date. We don't need a third column.
I can't guarantee that either or both of these fields will be there/have a date in it, though I think probably at least one of them will be. If one is missing/blank then the corresponding cell should say "Unavailable". The format of <electronicPublicationDate>
will always include a year, month and date, but the format of <dateofPublication>
could be just the year, or the year and a month, or a year, month and date.
I hope this helps, let me know if you have any questions or think there's a better way of doing it than I've suggested.
Were you meaning to attach a picture @CecyMarden ? If the github copy/paste thing isn't working for you, try using http://snag.gy/ instead, and just paste a link to the picture here.
Huh, I didn't even think those elements were pictures. I'll try writing the tags here:
<dateofPublication>
<electronicPublicationDate>
In case it's something to with the tag arrows, it is
dateofPublication electronicPublicationDate
It IS the tags! Does that make sense now?
Oh you put XML in here, right. Surround xml with three backticks:
I've updated your comments :). Thanks for the info
Attempt to get both electronic and print publication dates and output separately.
Spec (from comment below):
There are two dates we are interested in, and these are in the following two XML elements on Europe PMC:
The column containing the first date should be titled Publication Date, and the column containing the second date should be titled Electronic Publication Date. We don't need a third column.
I can't guarantee that either or both of these fields will be there/have a date in it, though I think probably at least one of them will be. If one is missing/blank then the corresponding cell should say "Unavailable". The format of
<electronicPublicationDate>
will always include a year, month and date, but the format of<dateofPublication>
could be just the year, or the year and a month, or a year, month and date.