acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
415 stars 282 forks source link

Create an event tarball #2592

Open mjpost opened 1 year ago

mjpost commented 1 year ago

We currently produce a consolidated "book" or proceedings, but are going to need to stop. For large conferences these books are approaching 10k pages and are pushing the bounds of what can be handled in the PDF format. At the same time, the book is useful for offline browsing of a proceedings. It would be nice if we provided the option to download a tarball for an entire event—basically, everything linked to from the event page (e.g., https://aclanthology.org/events/acl-2023), including PDFs.

mbollmann commented 1 year ago

The easiest way to do this would be by volume. Make an attachment in the volume's meta block, and it should show up on the same page as the full proceedings PDFs do currently.

On 23 Jun 2023, 14:57, at 14:57, Matt Post @.***> wrote:

We currently produce a consolidated "book" or proceedings, but are going to need to stop. For large conferences these books are approaching 10k pages and are pushing the bounds of what can be handled in the PDF format. At the same time, the book is useful for offline browsing of a proceedings. It would be nice if we provided the option to download a tarball for an entire event—basically, everything linked to from the event page (e.g., https://aclanthology.org/events/acl-2023), including PDFs.

-- Reply to this email directly or view it on GitHub: https://github.com/acl-org/acl-anthology/issues/2592 You are receiving this because you were assigned.

Message ID: @.***>

mjpost commented 1 year ago

Oh, that's interesting, you would just create it off-line. I agree that that would be the easiest way. Note that we can also link things from event pages now, so we can do this for the complete event, too/instead.

mbollmann commented 1 year ago

Generating a tar-ball "on demand" would require some dynamic server-side component, if I'm not mistaken; and the build chain is agnostic to the actual PDFs on the server. So doing it offline and analogous to the proceedings PDFs seems like the most straightforward option.

Maybe it could be part of an upload script? E.g. whenever a PDF is replaced for whatever reason, the tar-ball for the full volume is updated and replaced as well?

Not sure yet about whether to attach this to events or volumes. Having it on volume pages would make it analogous to the full proceedings PDFs, and might be where people are more likely to look. If we're doing it on event pages we should modify the schema to allow "attachments" with checksums similar to what papers have; currently they are only intended to have external links AFAICT.

On 24 Jun 2023, 19:26, at 19:26, Matt Post @.***> wrote:

Oh, that's interesting, you would just create it off-line. I agree that that would be the easiest way. Note that we can also link things from event pages now, so we can do this for the complete event, too/instead.

-- Reply to this email directly or view it on GitHub: https://github.com/acl-org/acl-anthology/issues/2592#issuecomment-1605657951 You are receiving this because you were assigned.

Message ID: @.***>