elifesciences / elife-crossref-feed

code to support uploading info to crossref on PAW articles
1 stars 1 forks source link

Providing eLocation-id to Crossref #134

Closed Melissa37 closed 6 years ago

Melissa37 commented 6 years ago

After 3 years of requesting the elocation-id element in the Crossref schema, I've found out how we can provide our elocation id in the meantime, see below:

<publisher_item>
   <item_number item_number_type="article_number">e12345</item_number>
</publisher_item>
Melissa37 commented 6 years ago

Graham, how easy would it be to implement this?

gnott commented 6 years ago

It looks like it is the elocation-id of the article deposited itself, and not an elocation-id for citations, correct?

The logic already supports it (https://github.com/elifesciences/elife-crossref-xml-generation/blob/develop/elifecrossref/generate.py#L209).

For probably some reason at the time we deployed the enhanced deposits, we had this feature turned off for eLife articles; in the crossref.cfg file for the elife section it has elocation_id: false (https://github.com/elifesciences/elife-crossref-xml-generation/blob/develop/crossref.cfg#L37). I believe the .cfg file in the project repository is very close to what we deploy to the production environment.

If you would like to deposit the elocation-id for eLife articles, we are to change false to true in the appropriate crossref.cfg file in our builder file repository and deploy it, it should start sending the value.

Do we also want to resupply all the articles again for all elocation-id values to be deposited with Crossref?

Melissa37 commented 6 years ago

Oh wow, how cool! So, we could provide:

<publisher_item>
   <item_number item_number_type="article_number">e12345</item_number>
</publisher_item>

just by changing false to true in the .cfg file?

Yes please can we switch it? I don't think we need to redeposit the archive, depends on how much time it would take from you to do so?

Fab!

Thanks M

gnott commented 6 years ago

Basically changing false to true will accomplish the result. As I test it, I see some improvements to be made in how the configuration is loaded, so I will be doing that at the same time, otherwise it is more difficult to test.

Redepositing the archive to Crossref would not be too difficult, but maybe it's not necessary for the inclusion of this one article identifier. What is the impact of including the elocation_id for eLife articles, for example, does it make it more accurate for others to reference an eLife article when relying on Crossref data? Depending on your thoughts, we could omit redepositing this time and wait for some other reason to redeposit all.

Melissa37 commented 6 years ago

This comes off the back of a request from Dryad, and I think it might be helpful for others too, so if not too much hassle it might be good to resupply the archive. WDYT?

Question for you (or whomever the appropriate party is) about eLife's metadata available in Crossref. Not sure if you saw the recent blog post that explains our "Automated Publication Updater" feature ( https://blog.datadryad.org/2017/12/18/improvements-in-data-article-linking/), but querying Crossref is now the primary way we figure out when articles associated with Dryad data packages have been published.

When a good match is found, the APU automatically updates article-related fields in the Dryad record, including the article citation. But, we have a small issue with the eLife metadata.

eLife's recommended citation format includes an article number, e.g. "e32373", but that piece of information is not specifically included in your Crossref metadata -- so we have to copy it in manually. Looking at other journals that are online-only, they seem to include this kind of identifier in Crossref's "article-number" field.

Would it be possible to make an adjustment to include this field? I'm thinking there would be benefits to eLife beyond just interoperability with Dryad (i.e., it makes your Crossref metadata higher-quality and more useful to any other services that may seek to use it).

I hope this makes sense! If you have questions or I should direct this inquiry elsewhere, please let me know.

gnott commented 6 years ago

It's great to have a specific use case for the value, and it looks like it is the value we plan to populate. Maybe in Dryad's case they've already indexed all the previous eLife articles, or maybe some are ignored because of the missing values.

eLife's current process for resupply should be fairly easy - manual, but not incredibly time intensive. I would just need to copy the most recent version of each article XML to our S3 outbox, and then run the DepositCrossref workflow. It should result in all added into the Crossref ingest queue. We should see no errors, or hopefully fewer than the last time we resupplied all the articles. I would do it in a few batches to be sure the workflow doesn't timeout, might take a couple hours.