acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
445 stars 301 forks source link

Feedback on the new Anthology website #170

Closed mbollmann closed 1 year ago

mbollmann commented 5 years ago

This thread is intended to collect all feedback, suggestions, bug reports, etc. for the new Anthology website in the static-rewrite branch.

(Edit: live demo here at http://aclweb.org/anthology)

If you do not have a GitHub account, you're also welcome to send me feedback via e-mail (marcel@bollmann.me) or Twitter (\@mmbollmann)!

Known Issues

sedimentation-fault commented 4 years ago

I want to commend you and thank you for putting so much thought into a naming system of papers and their related information, like bibtexs. This is by no means self-evident - and in fact I have seen it only at ACL. It is incredible, but you are the only people on the planet who name related resources with the same basename! This puts you light years ahead of your time!

Let me explain:

Suppose I look at the aclweb.org site and say to myself: "WOW! What an incredible treasure. I would love to download it all and have it in my local paper library for my reading pleasure!". Well, that's easy. I remember looking at it 10 years ago - or even further back in the past. It has always been easy to "get them all". But that's only part of the story. Having PDFs named like "pennington-etal-2014-glove.pdf" does not help at all - you must rename them to some naming scheme amenable to searching, e.g.:

Venue Volume Issue Year DOI Authors Title

For example, for the above paper:

Proceedings 2014 Conference on Empirical Methods in Natural Language Processing EMNLP 2014 [doi 10.3115%2Fv1%2FD14-1162] Pennington, Jeffrey; Socher, Richard; Manning, Christopher -- Glove - Global Vectors for Word Representation.pdf

Notice that you can reconstruct a basic bibtex from the above name, knowing that semicolons delimit author names, the tile comes after ' -- ', the DOI is the URL-encoded string XXX in '[doi XXX]', the year is the 4-digit string before the DOI part and "Venue" is before that.

For this to work, you need bibliographic information for each paper, say in the form of a .bib file. You have that - everybody has that. But what you have - and everybody else is still missing - is this:

The paper and its associated bibliographic information have the same basename! That is, if the above paper has a URL

https://www.aclweb.org/anthology/D14-1162

then I know that the paper is at

https://www.aclweb.org/anthology/D14-1162.pdf

and its associated bibtex at

https://www.aclweb.org/anthology/D14-1162.bib

I can get those two just by looking at the 'url={...}' lines of the 'cumulative' bibtex at

https://www.aclweb.org/anthology/anthology.bib.gz

and as soon as I have two files, one PDF and one BIB, with the same basename

D14-1162.pdf D14-1162.bib

I know they are connected!

It's so simple, but its impact is immense. Imagine you would have a PDF

D14-1162.pdf

but its bibtex would have a different basename, say

pennington-etal-2014-glove.bib

How on earth would you know they belong together? You would have to resort to web scraping: parse each proceedings HTML page and, for each PDF link on it, find the '.bib' HTML link that is visually 'nearest' to it. This is programming hell.

Having a local paper collection, with papers renamed as above, makes searching (an issue that has been the subject of quite a few postings above) a dream: just list your local papers and pipe the list to a text file. Now use that text file as a "poor man's index" using, say, grep. You can grep it with any regular expression you like

grep -E 'your regexp' index.txt

If you rename your local papers as above, you will be amazed at what you can find by such a simple method!

So thank you for making local collections possible with such genial ideas like providing a cumulative .bib file and using consistent names across the whole site for both papers and their bibliographic information. Forget OAI-PMH, federated repository aggregators and all that! All a truly open access paper repository needs is those two simple things!

lucy3 commented 4 years ago

It's possible my brain is pudding right now, but is there a way to navigate to EMNLP Findings papers from the homepage of the ACL anthology? I see they're posted here: https://www.aclweb.org/anthology/volumes/2020.findings-emnlp/, but ctrl-F for "Findings" on the main page or the EMNLP page doesn't lead to any results.

mjpost commented 4 years ago

Hi @lucy3—it's not currently linked from the front page, but will be soon.

Pranav-Goel commented 3 years ago

A paper I have not authored was wrongly assigned to my ACL Anthology page because I have the exact same name as the first author on that paper. How do I get it removed from my profile?

mbollmann commented 3 years ago

@Pranav-Goel Please open a new issue for that, and make sure to include the Anthology ID(s) of the paper(s) in question. We can disambiguate authors in the metadata then. If you have an academic website and/or an ORCID ID, feel free to include a link to it too, as it might help us with the disambiguation process.

AmyOlex commented 3 years ago

When search results come up for a keyword search it would be helpful to see the data of publication and the list of authors. Some of the PDFs don't have any dates in the footer. Also, would there be a way to subscribe to a certain search result and get email updates when new papers are posted that match?

BramVanroy commented 3 years ago

The form that is linked to in the side bar "The Anthology can archive your poster or presentation! Please submit them in PDF format by filling out this form." is not accessible anymore.

AGalassi commented 3 years ago

Not sure if this is the right place where to ask these things, please redirect me if this is the wrong place:

akoehn commented 3 years ago

ORCID: We do not have orcid data for authors, so currently not. See eg.g https://github.com/acl-org/acl-anthology/pull/1179 for WIP.

zixiu-alex-wu commented 2 years ago

Hi, first of all, thank you for your work on the ACL Anthology website --- it's amazing!

My question is about the (lack of) Scopus indexing of my paper (https://aclanthology.org/2021.clpsych-1.22/), accepted to the Seventh Workshop on Computational Linguistics and Clinical Psychology, co-located with NAACL 2021.

Right now, the paper is NOT indexed on Scopus, and upon further digging, I have found that neither the workshop itself (2021 occurrence) nor NAACL 2021 is Scopus-indexed, despite the fact that previous proceedings of both are indexed. I was wondering if you could help to make sure that my paper and the proceedings of the workshop and NAACL 2021 are indexed on Scopus? Without the indexing, my paper will not count towards my PhD degree.

I have tried to contact anthology@aclweb.org about this multiple times, but I have not heard from them.

Thank you!

Best regards,

Zixiu Wu

mbollmann commented 2 years ago

@zixiu-alex-wu Have you tried asking Scopus about this? I'm not aware of anything we do on our side related to indexing papers in other databases, and would be surprised if we had any control about this.

mjpost commented 2 years ago

We have been manually submitting proceedings here and there, as we find motivated volunteers to do so. I’m in the process of working with a volunteer to be more systematic about this, but it’s currently something we are not handling well. It is on our 2022 roadmap, however!

zixiu-alex-wu commented 2 years ago

@zixiu-alex-wu Have you tried asking Scopus about this? I'm not aware of anything we do on our side related to indexing papers in other databases, and would be surprised if we had any control about this.

Hi Marcel, thank you for your response! I have actually asked Scopus already, and they have an investigation underway, so I thought I'd ask the ACL anthology people about this as well.

In fact, the workshop's organiser referred me to the chairs of NAACL 2021, who in turn referred me to "the ACL Anthology folks", because, as he put it, "they are maybe the only ones who would know the most recent year of NAACL is not indexed yet in Scopus".

Once Scopus informs me of the results of their investigation, I will put an update here.

Thank you again for your response!

zixiu-alex-wu commented 2 years ago

We have been manually submitting proceedings here and there, as we find motivated volunteers to do so. I’m in the process of working with a volunteer to be more systematic about this, but it’s currently something we are not handling well. It is on our 2022 roadmap, however!

Hi Matt,

Thank you for your response!

So, if I am not mistaken, the proceedings of ACL conferences such as NAACL, as well as the proceedings of the co-locating workshops, have mostly been submitted to Scopus manually, which has resulted in the indexing of the proceedings of the conferences in previous years.

In that case, I was wondering if you could perhaps arrange for the proceedings of the workshop in question (https://aclanthology.org/volumes/2021.clpsych-1/) as well as the proceedings of NAACL 2021 to be submitted to Scopus, so that both of them as well as my workshop paper (https://aclanthology.org/2021.clpsych-1.22/) would be indexed?

Thank you!

Best regards,

Zixiu Wu

AGalassi commented 2 years ago

We have been manually submitting proceedings here and there, as we find motivated volunteers to do so. I’m in the process of working with a volunteer to be more systematic about this, but it’s currently something we are not handling well. It is on our 2022 roadmap, however!

Hi, does this apply only for the main ACL conferences or also for non-ACL events, such as COLING?

mjpost commented 2 years ago

The ACL can only assume responsibility for ACL events, unless some arrangement is made.

AGalassi commented 2 years ago

The ACL can only assume responsibility for ACL events, unless some arrangement is made.

Thank you! Since these other events are present in the ACL anthology as well I was not sure if they were managed independently or not.

mbollmann commented 1 year ago

Closing this as the “new” website is now over 4 years old. Feedback is still welcome, but can go into more specific issues.