elifesciences / elife-continuum-documentation

ppp project related files and prototypes
MIT License
13 stars 3 forks source link

PoA article URL or path #33

Closed gnott closed 7 years ago

gnott commented 9 years ago

Working on jats-scraper to scrape PoA XML, what is the proposed URL or injest path for PoA articles?

PoA XML has no <volume> element, and no pub date. The existing VoR path expects the article to have a volume to generate the path, for example

content/3/e00013

where 3 is the volume.

PoA paths right now are like this

content/early/2015/07/14/eLife.09143

If we need a path before the pub date is actually set - in order to injest the article - a possible solution could be

content/early/e00013
IanMulvany commented 9 years ago

volumes are currently hard set according to the year of publication, so what we really need to know is how to set the publication date for a POA article, if it has not already been set (i.e. if it is not a POA article that is being pushed through for a resupply or a repopulation of a site.)

Melissa37 commented 9 years ago

See https://github.com/elifesciences/elife-vendor-workflow-config/issues/118

It is now a task for the archive clean up to add the vol and pub date into the PoA xml. I will update the XML sample to include that.

Melissa37 commented 9 years ago

https://docs.google.com/spreadsheets/d/1nFwQB0USPfJDLPPOYyIe6dTo6XsfJTBrhuPwghD9ZMc/edit#gid=0

See that for url structure

gnott commented 9 years ago

To do in reference to the google sheet, is URLs have "v1" or "v2" in them.

The jats-scraper when creating node paths should include this version value in the path, unless the Drupal site will be altering them after ingest.

nlisgo commented 9 years ago

It is my preference that the Drupal site not alter the paths after ingest. jats-scraper should be capable of supplying the appropriate url's. I believe it can at the moment. The benefit of jats-scraper being able to generate the existing format of urls (those on the current live site) and the desired urls is that we could use the scraper to help us generate a 301 redirect table for apache or nginx.

Now we have confirmed that we can generate the current paths we should preserve that code but for the generation of the eif-format json we should now be using the preferred paths.

nlisgo commented 9 years ago

We could create a task for that in jira and schedule it for next sprint perhaps?

jhroot commented 9 years ago

It should indeed be able to and it makes sense it does as then those paths are available in the other places fed by the process.

Looking at the code I can't see any versions in paths being produced, either for the article or other assets (e.g. images) so we'll need to add this.

We may have been waiting for that confirmation from Scholar?

IanMulvany commented 9 years ago

Google Scholar have indicated that our suggested URL paths are OK. We are aiming to give them preview access to the site about a month before we go live and they have said that they will try to crawl it to make sure there are not problems that they can identify.

nlisgo commented 9 years ago

@jhroot the versions in paths will still need to be added perhaps but the correct stubs to fragments has been done but is not supplied to the eif-format json yet. Graham did some work on this front.

jhroot commented 9 years ago

@nlisgo I'm not sure what you mean, sorry. ('correct stubs to fragments'). Maybe a terminology issue

nlisgo commented 9 years ago

@jhroot I just mean the prefix and ordinal output in the url that is associated with the fragment.

Old style: F6 (6th figure or figure supplement regardless of hierarchy) New style: figure2

nlisgo commented 9 years ago

stubs is probably the wrong word.