acl-org / acl-anthology

Data and software for building the ACL Anthology.
Apache License 2.0
376 stars 252 forks source link

Adding The Finite String newsletter #1695

Open evanmiltenburg opened 2 years ago

evanmiltenburg commented 2 years ago

Following a discussion on Twitter it seems that it is very useful to archive the Finite String Newsletter that used to be sent out by ACL. The newsletter used to be a bit more prominent on the old Anthology but on the current one it's hard to find. The problem is that the collection in the Anthology is incomplete.

I want to collect these newsletters and add them to the Anthology. I'd be happy to add all relevant metadata and produce an XML file that could be ingested to the Anthology.

I have three questions:

  1. Do you agree that missing items should be added?
  2. What should the XML look like? What metadata should be included?
  3. How to resolve the issue that the newsletter archive is hard to find?
mbollmann commented 2 years ago

Re 1: Absolutely, yes!

Re 2: We're currently actively working on how to represent non-paper entries in the Anthology (#298, #1164), which may be relevant for these if they were only published on a website, for example. In that case, we might have a more concrete idea on how to represent these in a few weeks or so.

evanmiltenburg commented 2 years ago

Great, then at least I can go ahead with the collection and we'll see how to represent everything. I think one way to go about this is to first just include all pdfs with year, volume, issue number, and then later update with other relevant info. Also considering the labor-intensive nature of including the rest of the metadata, this seems like a good first step to at least make the pdfs available.

mjpost commented 2 years ago

Agreed, this would be great!

Unfortunately there's no easy way to address (3), apart from creating a page (under hugo/content which we could then manually link to in the front page table.

See also #285, where some of the Finite String reorg was done.

mjpost commented 2 years ago

@evanmiltenburg if you wanted to get started, just creating a Google sheet or something with an entry per row would make it pretty easy to ingest, once we have an XML format designed. It would be nice to have a comprehensive list including the current ones, too.

evanmiltenburg commented 2 years ago

Once I have the relevant copies, I will try to prepare a Google Sheet with some sensible columns.

Having a separate page would make sense, given that this is the kind of document that you'd use for historical research. Listing all editions separately from the CL entries would make it much easier to see what information is there, and look through the separate PDFs.

evanmiltenburg commented 2 years ago

Update: Graeme Hirst responded that his university library has copies that have not been digitised yet, and he is looking into options for scanning them. Just to be sure, if we have scanned copies, it's OK to publish them in the Anthology, right? I mean, the journal has been published already by ACL..

evanmiltenburg commented 2 years ago

I've received some copies from 1992 and 1993, and I'm in the process of cleaning them up so there's a nice version to upload to the Anthology. What are the next steps?

Also it seems the old location now returns a 403 error. The page is still available through the Wayback machine though. What happened?

evanmiltenburg commented 2 years ago

@evanmiltenburg if you wanted to get started, just creating a Google sheet or something with an entry per row would make it pretty easy to ingest, once we have an XML format designed. It would be nice to have a comprehensive list including the current ones, too.

@mjpost what should the sheet look like? For reference, one of the older issues is listed in the Anthology as:

J79-1073: The Finite String, Volume 15, Number 1 (February 1978)
What Some Semantic Theories Can't Do (Th R Hofmann), NL in Information Science (Donald E Walker; Hans Karlgren; Martin Kay), CAL in Science Education, New Journal Annuals Of the History of Computing (Bernard A Caller), New England Research Application Center, Linguafranca Document Search (LLBA), Demonstration Interactive Search of LLBA, NFAIS/ UNESCO Indexing Education Kit, Synmposium Computer Assisted Learning (J J Mathews), 1978 Linguistics Institute Conference And Symopsia, DATA Bases Usability and Responsiveness (Dr Allen Baiter), Conferences Internal Auditing (D Eugene Shaeffer), Conferences Breifly Noted (K preston Jr), NSF Awards in Computer Science for 1976, AJCL A Description, AJCL Page Format, AJCL Opaque Card format, AFIPS Washington Newsletter

We could keep this format, and per issue have comma-separated entries with titles and authors optionally listed in parentheses. Separating all entries doesn't make much sense if everything is in the same PDF anyway, but we could have a list of bullet points with one entry per bullet.