Closed gnott closed 2 years ago
In looking at this a little more closely today, the Crossref example of the <update type="correction" ...
tag is straightforward; it looks just like the tag we are now adding when depositing a correction
article, except the DOI of a correction article is the DOI it is correcting, and an in-situ correction would be the same DOI as the article (correcting itself).
The simple Crossref generation logic starts with an article XML file. This XML may or may not include data about the article's version history, the dates of the previous versions of an article. At least in older (and probably current) eLife article XML, it does not include a list of the previous article versions in the XML. In future it may, but it is not always there. Did you intend at all @Melissa37 for eLife's case that the version corrections of articles for eLife would only be deposited after the article history is present in the article XML?
Another posssible source of version data, again in eLife's case, is from the Lax datastore. There is also the ability to be flexible in using the Crossref deposit generation code as a step-wise process. The article data can initially be populated from an article XML file, then data on that Article
object can be altered or ammended. When a DepositCrossref
workflow is run for eLife, we could gather the previous verisons and dates of each version from Lax and add those to the Article
object, prior to generating the Crossref deposit XML as the final step.
The Article
does not yet have a property to store article history (I think) but we can add that for visibility and completeness.
@Melissa37 if what I'm describing here is clear enough, am I along the right track and understanding how eLife might deposit in-situ corrections?
Each time a new article version is deposited to Crossref, we'd populate the full history of previous versions. Currently we seem to only need the date of each version for Crossref's purposes.
One interesting situation may arise from a formal correction, although it fits within these rules. For example, say an article is published, then corrected. The correction article would result in a Crossref deposit including a correction that points back to the article it is correcting. The corrected article (a version 2 of that article, I would presume), would result in a Crossref deposit that includes a correction to itself, the date of that correction would be the date the version 2 was published. In this way, a formal correction would result in a two Crossref correction deposits: one formal one, and one in-situ one.
Did you intend at all @Melissa37 for eLife's case that the version corrections of articles for eLife would only be deposited after the article history is present in the article XML?
No, I was planning to base it on Lax or observer data. Ultimately I would like articles to contain their historical version info, but I don't think this will happen for another year.
@Melissa37 if what I'm describing here is clear enough, am I along the right track and understanding how eLife might deposit in-situ corrections?
This sounds perfect, thank you.
Each time a new article version is deposited to Crossref, we'd populate the full history of previous versions. Currently we seem to only need the date of each version for Crossref's purposes.
So when we start publishing history event dates we would have the potential to update the archive if we ever thought it was worthwhile doing it. Nice :-)
One interesting situation may arise from a formal correction, although it fits within these rules. For example, say an article is published, then corrected. The correction article would result in a Crossref deposit including a correction that points back to the article it is correcting. The corrected article (a version 2 of that article, I would presume), would result in a Crossref deposit that includes a correction to itself, the date of that correction would be the date the version 2 was published. In this way, a formal correction would result in a two Crossref correction deposits: one formal one, and one in-situ one.
This is a very good point. We version the article if it has an official correction/erratum notice attached to it. Can you think of a way to circumvent this so it only has the formal and not the in situ one as well? They are generally both published at the same time, so we could put a hold on depositing in situ corrections (say 24 hour window) to Crossref and then do a check to see whether an official correction was done in that 24 or previous 24 hr window. WDYT?
Thanks @Melissa37! These answers are enough for me to continue with making data structures for article history and populating them with data from Lax for a start.
I think for the correction + version resulting in two Crossref deposits is not too easy to resolve at this point. What hapens when a v3 of the article is published? Then the Crossref deposit will not be accompanied with the correction article when it is deposited. Does Crossref de-dupe Crossmark correction "updates-to"
data in their systems if two correction records have the same date? I think maybe one way to try it out is to find a suitable article to use as a test and to deposit the correction and the in-situ data for it and see what the result is at Crossref. We can probably reverse that if we don't like the result.
Today I tried a quick example for the in-situ updates data.
For the article's verison history data, there at least a couple way to structured it I think, to solve a potential complication which is: I assume we would never include the version 1 of an article as a Crossmark correction.
Possible way one, it to do this is to rely on eLife's versioning convention, which is all verions are numeric numbers and they start at 1. If this is safe, then we can base it on this, where we'd not include the version 1 in Crossref Crossmark deposits.
Possible way two, is in the article's version history, we push the logic a little further upstream and during parsing the article XML, we add an attribute to each version to indicate which type of version it is. Borrowing from issue https://github.com/elifesciences/issues/issues/3463 (if it is still applicable), we could have VoR
and CVoR
(Corrected Version of Record), for example. eLife could also continue to use PoA
.
Describing this, I think I may have hit on something for eLife's situation, and another quesiton for you @Melissa37 for clarification. If v1 is a PoA, v2 is VoR, and v3 is a VoR version, then would we only want to report the v3 to Crossref as an in-situ correction? In this situation, I think it would be safer and more straightforward to label each article version in the version history with some labels so we can produce a good Crossref deposit.
For the article's verison history data, there at least a couple way to structured it I think, to solve a potential complication which is: I assume we would never include the version 1 of an article as a Crossmark correction.
Can I just clarify what you mean by this? Are you saying Crossref does not consider the difference between a PoA and a VoR as a new version and we can ignore PoA 1 or PoA versions from this?
I think that is correct right now, but I personally would like to change Crossref's thinking on this! I think the difference between a PoA and VoR has a LOT of difference, but I guess I would think that as I deal with production and see all the value we add ;-)
Possible way two, is in the article's version history, we push the logic a little further upstream and during parsing the article XML, we add an attribute to each version to indicate which type of version it is. Borrowing from issue elifesciences/issues#3463 (if it is still applicable), we could have VoR and CVoR (Corrected Version of Record), for example. eLife could also continue to use PoA.
This is interesting and I like the sound of this. The only problem with this is that for each new version (if there are multiple versions) you would lose the added details you have as it comes from Exeter and production again? I still want to introduce this into production as we'd also give a reason for the change, which I feel should be in the XML - currently this is stored in Hypothesis commenting. However, we could use what work you do to provide some validation - ie in future what you would add if missing from the XML or different could result in a rejection?
Describing this, I think I may have hit on something for eLife's situation, and another quesiton for you @Melissa37 for clarification. If v1 is a PoA, v2 is VoR, and v3 is a VoR version, then would we only want to report the v3 to Crossref as an in-situ correction? In this situation, I think it would be safer and more straightforward to label each article version in the version history with some labels so we can produce a good Crossref deposit.
I think I answered the question above. But I am curious about
safer and more straightforward to label each article version in the version history with some labels so we can produce a good Crossref deposit.
what is adding these labels and to where? In the future they will be in the ML coming from production, but until then...
Regarding the PoA to VoR being a correction or not, I may have mis-remembered discussions about how a PoA is the same article in a different format, which was not considered to be a correction. I may have this wrong. If you consider PoA to VoR as a correction, then we can most certainly include that as a correction in Crossref Crossmark data.
I consider it a new version that warrants an update in Crossref's CrossMark widget, but I don't consider it a correction. So, in essence, you can ignore me :-)
For the version history data to support Crossref Crossmark, it would be specified and stored in the elife-article
Python objects, which is the data that populates Crossref, PubMed and PoA XML generation Python libraries. Since it only affects these parts, it would be separate from any other schemas.
Each version history event would probably have:
version version_type (PoA, VoR, CVor, etc.) date
at a minimum, and these are attached to an Article
that already has a DOI.
Optionally, the version history event could store a "comment" or additional details, but there is no regular source for those details right now, nor does Crossref Crossmark need those.
The version event data would come from Lax and be slotted in place as part of a Crossref deposit workflow.
You brought up a good point, thanks @Melissa37, because we are not limited to "correction". These are the values allowed from the Crossref schema (https://www.crossref.org/schemas/common4.4.1.xsd)
addendum
clarification
correction
corrigendum
erratum
expression_of_concern
new_edition
new_version
partial_retraction
removal
retraction
withdrawal
So we could report the VoR as a new_version
and the CVoR as a correction
, or maybe it should be a new_version
... ?
aha!! New version all the way for PoA to VoR and for new VoRs that do not have an official correction associated with them?
Guess this changes things?!
It helps, althought we might discuss on a next call?
I think the challenge would be that sometimes a new version is a correction, if there was a formal correction published, it would be associated with a particular version. If the dates of the correction and the article version match, we may be able to collate all the version history event types.
Fab, on the agenda! :-)
Pending JATS4R versioning history recommnedation
I had a bit of code in development until this was blocked by looking for version history data.
To act as a reminder of what I had done, which will mostly get reversed now, in elifecrossref/crossmark.py
, I expanded the criteria for do_updates()
to include whether the article object had version_history
:
def do_updates(poa_article):
"""decide if crossmark updates tag can be added"""
return bool(
(
poa_article.article_type in UPDATES_ARTICLE_TYPES and
poa_article.related_articles and
poa_article.related_articles[0].xlink_href
)
or
(
hasattr(poa_article, 'version_history') and
poa_article.version_history
)
)
then in set_updates()
,
...
if hasattr(poa_article, 'version_history') and poa_article.version_history:
for previous_version in poa_article.version_history:
set_update(
...
I will merge in the new set_update()
function that I had split out to make things cleaner, which is not blocked by version history data.
I will also remove the test scenario I created for in-situ correction, because it is pretty simple and not very elaborately done yet.
I had a fresh look at this issue, ready to discuss next time @FAtherden-eLife.
I think perhaps it would be a good idea to create a new issue and add to it the questions and tasks going forward, and then we can close this older discussion.
There's currently support for depositing correction
and retraction
articles. Can all the other updates we want to deposit be new_version
type?
I believe we wanted to get the version history from a non-article-XML source, either Lax or data hub.
A potential wrinkle we may want to consider is whether the version history data is better deposited to Crossref in the post-publication tasks. Now we have Pending Publication DOI logic enabled, can we deposit full Crossref metadata post-publication always now? It might be a good time to review at which point in the publication workflow Crossref deposits happen.
Thanks @gnott, sounds good - let's discuss in elifesciences/issues#7177.
Originating from issue https://github.com/elifesciences/elife-crossref-feed/issues/145, I think it will be clearer to split off details concerning the in-situ Crossmark data here, @Melissa37.
Repeating some of the detalis from the original issue here:
https://www.crossref.org/education/crossmark/crossmark-registering-updates/
New versions From Crossref:
A start on a Definition of Done list:
Article
object data propertycrossmark: true
, and the article version data to be included, or should it be a separate config setting whether in-situ corrections are to be included in the Crossref deposits?