Closed mbollmann closed 1 year ago
I want to commend you and thank you for putting so much thought into a naming system of papers and their related information, like bibtexs. This is by no means self-evident - and in fact I have seen it only at ACL. It is incredible, but you are the only people on the planet who name related resources with the same basename! This puts you light years ahead of your time!
Let me explain:
Suppose I look at the aclweb.org site and say to myself: "WOW! What an incredible treasure. I would love to download it all and have it in my local paper library for my reading pleasure!". Well, that's easy. I remember looking at it 10 years ago - or even further back in the past. It has always been easy to "get them all". But that's only part of the story. Having PDFs named like "pennington-etal-2014-glove.pdf" does not help at all - you must rename them to some naming scheme amenable to searching, e.g.:
Venue Volume Issue Year DOI Authors Title
For example, for the above paper:
Proceedings 2014 Conference on Empirical Methods in Natural Language Processing EMNLP 2014 [doi 10.3115%2Fv1%2FD14-1162] Pennington, Jeffrey; Socher, Richard; Manning, Christopher -- Glove - Global Vectors for Word Representation.pdf
Notice that you can reconstruct a basic bibtex from the above name, knowing that semicolons delimit author names, the tile comes after ' -- ', the DOI is the URL-encoded string XXX in '[doi XXX]', the year is the 4-digit string before the DOI part and "Venue" is before that.
For this to work, you need bibliographic information for each paper, say in the form of a .bib file. You have that - everybody has that. But what you have - and everybody else is still missing - is this:
The paper and its associated bibliographic information have the same basename! That is, if the above paper has a URL
https://www.aclweb.org/anthology/D14-1162
then I know that the paper is at
https://www.aclweb.org/anthology/D14-1162.pdf
and its associated bibtex at
https://www.aclweb.org/anthology/D14-1162.bib
I can get those two just by looking at the 'url={...}' lines of the 'cumulative' bibtex at
https://www.aclweb.org/anthology/anthology.bib.gz
and as soon as I have two files, one PDF and one BIB, with the same basename
D14-1162.pdf D14-1162.bib
I know they are connected!
It's so simple, but its impact is immense. Imagine you would have a PDF
D14-1162.pdf
but its bibtex would have a different basename, say
pennington-etal-2014-glove.bib
How on earth would you know they belong together? You would have to resort to web scraping: parse each proceedings HTML page and, for each PDF link on it, find the '.bib' HTML link that is visually 'nearest' to it. This is programming hell.
Having a local paper collection, with papers renamed as above, makes searching (an issue that has been the subject of quite a few postings above) a dream: just list your local papers and pipe the list to a text file. Now use that text file as a "poor man's index" using, say, grep. You can grep it with any regular expression you like
grep -E 'your regexp' index.txt
If you rename your local papers as above, you will be amazed at what you can find by such a simple method!
So thank you for making local collections possible with such genial ideas like providing a cumulative .bib file and using consistent names across the whole site for both papers and their bibliographic information. Forget OAI-PMH, federated repository aggregators and all that! All a truly open access paper repository needs is those two simple things!
It's possible my brain is pudding right now, but is there a way to navigate to EMNLP Findings papers from the homepage of the ACL anthology? I see they're posted here: https://www.aclweb.org/anthology/volumes/2020.findings-emnlp/, but ctrl-F for "Findings" on the main page or the EMNLP page doesn't lead to any results.
Hi @lucy3—it's not currently linked from the front page, but will be soon.
A paper I have not authored was wrongly assigned to my ACL Anthology page because I have the exact same name as the first author on that paper. How do I get it removed from my profile?
@Pranav-Goel Please open a new issue for that, and make sure to include the Anthology ID(s) of the paper(s) in question. We can disambiguate authors in the metadata then. If you have an academic website and/or an ORCID ID, feel free to include a link to it too, as it might help us with the disambiguation process.
When search results come up for a keyword search it would be helpful to see the data of publication and the list of authors. Some of the PDFs don't have any dates in the footer. Also, would there be a way to subscribe to a certain search result and get email updates when new papers are posted that match?
The form that is linked to in the side bar "The Anthology can archive your poster or presentation! Please submit them in PDF format by filling out this form." is not accessible anymore.
Not sure if this is the right place where to ask these things, please redirect me if this is the wrong place:
ORCID: We do not have orcid data for authors, so currently not. See eg.g https://github.com/acl-org/acl-anthology/pull/1179 for WIP.
Hi, first of all, thank you for your work on the ACL Anthology website --- it's amazing!
My question is about the (lack of) Scopus indexing of my paper (https://aclanthology.org/2021.clpsych-1.22/), accepted to the Seventh Workshop on Computational Linguistics and Clinical Psychology, co-located with NAACL 2021.
Right now, the paper is NOT indexed on Scopus, and upon further digging, I have found that neither the workshop itself (2021 occurrence) nor NAACL 2021 is Scopus-indexed, despite the fact that previous proceedings of both are indexed. I was wondering if you could help to make sure that my paper and the proceedings of the workshop and NAACL 2021 are indexed on Scopus? Without the indexing, my paper will not count towards my PhD degree.
I have tried to contact anthology@aclweb.org about this multiple times, but I have not heard from them.
Thank you!
Best regards,
Zixiu Wu
@zixiu-alex-wu Have you tried asking Scopus about this? I'm not aware of anything we do on our side related to indexing papers in other databases, and would be surprised if we had any control about this.
We have been manually submitting proceedings here and there, as we find motivated volunteers to do so. I’m in the process of working with a volunteer to be more systematic about this, but it’s currently something we are not handling well. It is on our 2022 roadmap, however!
@zixiu-alex-wu Have you tried asking Scopus about this? I'm not aware of anything we do on our side related to indexing papers in other databases, and would be surprised if we had any control about this.
Hi Marcel, thank you for your response! I have actually asked Scopus already, and they have an investigation underway, so I thought I'd ask the ACL anthology people about this as well.
In fact, the workshop's organiser referred me to the chairs of NAACL 2021, who in turn referred me to "the ACL Anthology folks", because, as he put it, "they are maybe the only ones who would know the most recent year of NAACL is not indexed yet in Scopus".
Once Scopus informs me of the results of their investigation, I will put an update here.
Thank you again for your response!
We have been manually submitting proceedings here and there, as we find motivated volunteers to do so. I’m in the process of working with a volunteer to be more systematic about this, but it’s currently something we are not handling well. It is on our 2022 roadmap, however!
Hi Matt,
Thank you for your response!
So, if I am not mistaken, the proceedings of ACL conferences such as NAACL, as well as the proceedings of the co-locating workshops, have mostly been submitted to Scopus manually, which has resulted in the indexing of the proceedings of the conferences in previous years.
In that case, I was wondering if you could perhaps arrange for the proceedings of the workshop in question (https://aclanthology.org/volumes/2021.clpsych-1/) as well as the proceedings of NAACL 2021 to be submitted to Scopus, so that both of them as well as my workshop paper (https://aclanthology.org/2021.clpsych-1.22/) would be indexed?
Thank you!
Best regards,
Zixiu Wu
We have been manually submitting proceedings here and there, as we find motivated volunteers to do so. I’m in the process of working with a volunteer to be more systematic about this, but it’s currently something we are not handling well. It is on our 2022 roadmap, however!
Hi, does this apply only for the main ACL conferences or also for non-ACL events, such as COLING?
The ACL can only assume responsibility for ACL events, unless some arrangement is made.
The ACL can only assume responsibility for ACL events, unless some arrangement is made.
Thank you! Since these other events are present in the ACL anthology as well I was not sure if they were managed independently or not.
Closing this as the “new” website is now over 4 years old. Feedback is still welcome, but can go into more specific issues.
This thread is intended to collect all feedback, suggestions, bug reports, etc. for the new Anthology website in the
static-rewrite
branch.(Edit: live demo here at http://aclweb.org/anthology)
If you do not have a GitHub account, you're also welcome to send me feedback via e-mail (marcel@bollmann.me) or Twitter (\@mmbollmann)!
Known Issues