acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
384 stars 256 forks source link

Add URLs from papers to the Web Archive #559

Open dmhowcroft opened 4 years ago

dmhowcroft commented 4 years ago

Many websites fall into disrepair and go offline even after being published in a conference paper. One way we can ensure that at least the pages describing a project (if not the corpora, etc) produced by the project) remain accessible is to systematically add each URL featuring in a paper in the ACL anthology to the Internet Archive.

This would be an enhancement improving the archival state of our field going forward.

mjpost commented 4 years ago

I like this idea. In general it’s good for us to limit what the Anthology does so as to avoid incurring technical debt that deluge our volunteers, so this is a good solution since it has us continuing to rely on external resources.

Another idea to add to this: if we extracted links in a clean manner, we could add them as links on the top level page, similar to what we do with attachments now.

Are you just filing this idea or do you also have some interest in heading this up in your spare time?

dmhowcroft commented 4 years ago

I wanted to file the idea while it was fresh; I've been skimming a lot of older SIGGEN papers and occasionally following their links, which is what made me think of it.

In principle, I'm open to heading this up, but I could not make any promises about a timeline.

If I find time to start on it, I'll post here; if I haven't made such a post and someone else is interested in starting on it, I hope they do :)