CottageLabs / oacwellcome

OA Compliance Checking for Wellcome Trust
Other
1 stars 1 forks source link

Publication date algorithm for EPMC data #91

Open emanuil-tolev opened 8 years ago

emanuil-tolev commented 8 years ago

Attempt to get both electronic and print publication dates and output separately.

Spec (from comment below):


There are two dates we are interested in, and these are in the following two XML elements on Europe PMC:

<dateofPublication> 
<electronicPublicationDate> 

The column containing the first date should be titled Publication Date, and the column containing the second date should be titled Electronic Publication Date. We don't need a third column.

I can't guarantee that either or both of these fields will be there/have a date in it, though I think probably at least one of them will be. If one is missing/blank then the corresponding cell should say "Unavailable". The format of <electronicPublicationDate> will always include a year, month and date, but the format of <dateofPublication> could be just the year, or the year and a month, or a year, month and date.

emanuil-tolev commented 8 years ago

@CecyMarden will check a few XML examples to see the (potentially 3) types of publication date. We'll have 3 columns if so.

CecyMarden commented 8 years ago

I have looked into this. There are two dates we are interested in, and these are in the following two XML elements on Europe PMC:

<dateofPublication> 
<electronicPublicationDate> 

The column containing the first date should be titled Publication Date, and the column containing the second date should be titled Electronic Publication Date. We don't need a third column.

I can't guarantee that either or both of these fields will be there/have a date in it, though I think probably at least one of them will be. If one is missing/blank then the corresponding cell should say "Unavailable". The format of <electronicPublicationDate> will always include a year, month and date, but the format of <dateofPublication> could be just the year, or the year and a month, or a year, month and date.

I hope this helps, let me know if you have any questions or think there's a better way of doing it than I've suggested.

emanuil-tolev commented 8 years ago

Were you meaning to attach a picture @CecyMarden ? If the github copy/paste thing isn't working for you, try using http://snag.gy/ instead, and just paste a link to the picture here.

CecyMarden commented 8 years ago

Huh, I didn't even think those elements were pictures. I'll try writing the tags here:

<dateofPublication>
<electronicPublicationDate>

In case it's something to with the tag arrows, it is

dateofPublication electronicPublicationDate

CecyMarden commented 8 years ago

It IS the tags! Does that make sense now?

emanuil-tolev commented 8 years ago

Oh you put XML in here, right. Surround xml with three backticks:

I've updated your comments :). Thanks for the info