elifesciences / elife-vendor-workflow-config

capturing requriemtns and niggles for setting up the elife production workflow
0 stars 1 forks source link

Clean up archive: reference page - last page number lower than first page number #109

Open Melissa37 opened 9 years ago

Melissa37 commented 9 years ago

If there is a programatic way to find last page that is shorter than first page and work out what initial number(s) are missing and add them back in, this would be fantastic to run over the archive!

gnott commented 9 years ago

In converting the archive, I have a first draft of this function.

I will look at all the possible fpage values to be sure the logic is ok, but I have a couple samples for you to comment on.

Sample 1: http://s3.amazonaws.com/elife-cdn/elife-articles/00090/elife00090.xml id="bib3"

<fpage>S15</fpage>
<lpage>27</lpage>

In the first draft it comes out the same, because 27 is not less than 15:

<fpage>S15</fpage>
<lpage>27</lpage>

Sample 2: http://s3.amazonaws.com/elife-cdn/elife-articles/00133/elife00133.xml id="bib4"

<fpage>E194</fpage>
<lpage>5</lpage>

This comes out as

<fpage>E194</fpage>
<lpage>E195</lpage>

I guess my question is when the fpage starts with a letter, what is the desired lpage value? Should the letter be repeated in the lpage value, or for the letter to not be repeated?

Melissa37 commented 9 years ago

This is interesting, so if the last page is lower than the first page, it repeats the letter prefix in the end page, but if the end page number is not lower than the first it does nothing and so the letter prefix is not added. Unless it's a 10 minute job to deal with that edge case, I think we should leave it as is. It's not important so not worth spending time on it.

Thanks! M

gnott commented 9 years ago

Basically if the last page is lower than it goes through the filter, but if it isn't lower it does not get processed and you end up with the original.

I could probably fix these letter(s) + digits edge case. I wanted to know the convention to follow. Should the last page always include the letter too, so we'd have

<fpage>S15</fpage>
<lpage>S27</lpage>

Or should the letter be dropped as in

<fpage>E194</fpage>
<lpage>195</lpage>

Or, I could just only leave the filter to act on numeric page number values only and to ignore any pages that are non-numeric.

Melissa37 commented 9 years ago

My preference would be for the alphabetical prefix to be added to the last page too.