Open mjpost opened 4 years ago
I wonder if we should get rid of
and in favor of generic tags with a revision attribute.
No: We then would need a <url type="revision">
and <url type="erratum">
because they behave differently. A revision is the new default for the PDF button, an erratum is not. IMO we would not gain anything by that change.
What part of the process leaves room for user error that couldn't be addressed by having a single script run through all these steps?
(NB: I don't know exactly what the script you currently use are doing.)
It's bin/add_revision.py, which
<revision>
tag → revision 1, existing <revision>
tag, then sum them and increment)Today (#903) I made edits to make this work with new-style IDs, and ended up pulling the HTML page and saving it as "v1.pdf". This is fixed now (and I added some sanity checks), but in general it just seems like bad practice to be overwriting files at all. I would have lost them if I didn't have a backup.
Re: the <url>
tag, it's a little unseemly to have both it and <revision>
pointing to the PDF. Do you disagree? It'd be sweeping change (a possible point against it), but it makes more sense to me to have a <version rev="1">
tag (or something of that sort), whose absence would mean no PDF is available. We would keep <erratum>
as is.
I have a deja-vu here: https://github.com/acl-org/acl-anthology/issues/295#issuecomment-624684119
Ah, I thought this felt familiar. Okay, we stick with overwriting the PDF, and I will be more careful.
That leaves the tag discussion. Do you disagree that having <url>
and <revision>
(once there is one) is redundant?
I wonder if changing <url>
to an explicit <pdf revid="1">
link would be more clear. In this scheme, revisions would just add additional <pdf>
tags. <erratum>
would stay the same. This might also help clarify a minor point of confusion between the notion of URL and PDF.
I agree it's not super elegant at the moment, but not fully redundant either. Assume there's one revision, then right now we produce links to [id].pdf
, [id]v1.pdf
, and [id]v2.pdf
. Now, [id].pdf
and whatever the latest revision is will always be identical, but we do always want the [id].pdf
link. So in that sense, if anything, it's the [id]v2.pdf
that's redundant.
So what you could do is something like:
<pdf revid="1" file="2020.test-test.42v1" />
<pdf revid="2" file="2020.test-test.42">This is a revision because of xyz.</pdf>
and have the website always link the entry with the highest revid
to the big PDF button. But note that there wouldn't be a v2
anymore in this version, and if we added another revision, we'd have to rename 2020.test-test.42
to 2020.test-test.42v2
, so if that's really more intuitive ... I don't know.
There is a lot of complexity and consequent room for user error in the current revisioning process:
{anthid}.pdf
) is always replaced so as to work with our redirect rules{anthid}v1.pdf
has to be created<revision>
and<url>
tagsI wonder if we should get rid of
<revision>
and<erratum>
in favor of generic<url>
tags with a revision attribute. It would also be nice to avoid renaming at all. This would break the ".pdf" URL shortcut, but maybe that doesn't matter—users would just have to visit the paper page to get the latest revision.