acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
445 stars 300 forks source link

Feedback on the new Anthology website #170

Closed mbollmann closed 1 year ago

mbollmann commented 5 years ago

This thread is intended to collect all feedback, suggestions, bug reports, etc. for the new Anthology website in the static-rewrite branch.

(Edit: live demo here at http://aclweb.org/anthology)

If you do not have a GitHub account, you're also welcome to send me feedback via e-mail (marcel@bollmann.me) or Twitter (\@mmbollmann)!

Known Issues

mbollmann commented 5 years ago

The main page of the anthology is slightly wider than all other pages. Moving to one of those other pages means the top bar contents jump closer together.

Yes, that's because the overview tables are so wide and should use all the space to prevent scrolling. I don't think there's a good way to address this without a complete overhaul of how we present all the conferences on the main page.

jwtaki commented 5 years ago

the searching does not work?

akoehn commented 5 years ago

@jwtaki are you in a country that blocks google by any chance? The search functionality is provided by them so search does not work in China for example.

allanj commented 5 years ago

New suggestion: it would be better to (have an option to) divide the papers in each year into different areas. Right now, we have a lot of papers different from 10 years ago when we could still read all of them.

I tried to read most of the papers but I have to select some of them. Thus, the area of interest is the key factor for me and other readers to read. Probably.

akoehn commented 5 years ago

@allanj the information is not stored in the meta data, so someone would have to classify all papers before any frontend logic could be written.

That is a lot of work and it is often not clear at all what categories should be used and one that is decided, which category a paper belongs to. Look at the high rates of papers moved between areas after the authors have selected a category.

TobiasLee commented 5 years ago

It would be better to support searching for papers by institute name.

aryamccarthy commented 5 years ago

@TobiasLee Like categories, this isn't stored in the metadata. We'd have to either extract it from the PDFs (which is noisy) or do it manually (which is so large as to be infeasible).

And on a personal note, I worry about the consequences of providing search by institution. Our community has become large enough that we can't read every paper, and search by institution may lead to a rich-get-richer bias in which papers get read. That's not a problem in itself; the problem is the other side of the scale: high-quality papers will be overlooked because of the institution they come from. (I believe that authors write papers, not institutions. The same could happen with author search, which is supported, but it's harder to make that systemic and entrenched.)

TobiasLee commented 5 years ago

@aryamccarthy Thanks for your considerable reply.

Evpok commented 5 years ago

@TobiasLee Like categories, this isn't stored in the metadata. We'd have to either extract it from the PDFs (which is noisy) or do it manually (which is so large as to be infeasible).

Consequences aside, GROBID is quite efficient at extracting such metadata from pdfs. HAL uses it to prefill metadata for new deposits.

mikhovr commented 5 years ago

It seems that search results aren't displayed in my browser. E.g. https://aclweb.org/anthology/search/?q=cross-lingual Firefox Quantum 60.3.0esr (32-bit) And it seems everything's okay in Chrome. изображение

mayhewsw commented 5 years ago

This is a tiny tiny thing, but it always bothers me that the yellow banner on top ("You're viewing the latest version...") is too close to the header.

If you have the time and patience for such trivia, you can fix this by adding padding to the parent div (container).

<div class="container" style="padding-top:15px">
  <aside class="alert alert-warning text-center py-1 mt-n3 mt-md-n4 mt-xl-n5" role="alert">You're viewing the latest version of the ACL Anthology.
    <a class="btn btn-warning mx-2" href="https://github.com/acl-org/acl-anthology/issues/170">Give feedback</a>
  </aside>
</div>
mjpost commented 5 years ago

Fixed live and I agree it looks better. Want to submit a PR? hugo/_default/baseof.html I believe.

mbollmann commented 5 years ago

Fixed live and I agree it looks better. Want to submit a PR? hugo/_default/baseof.html I believe.

Please use Bootstrap classes instead of custom style attributes, though. (Or play around with removing/changing the explicit negative margin classes of the banner, which is probably the more correct way of doing it.)

iyuge2 commented 5 years ago

awesome, thanks

enueno commented 5 years ago

I'd like to know the exact date were published the Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Also the actual papers do not display the CC-BY licencse, are they bound but that CC-BY licencse as stated in the webpages? the paper I am interested on, states that c 2018 Association for Computational Linguistics, what seems like an 'All rights reserved' llicense. Many thanks. If this is not the forum to find out this information, where / who could I contact? Perhaps it would be useful to find an email on the website for queries. Many thanks. Best wishes, Eva

akoehn commented 5 years ago

@enueno The conferences currently do not have an exact published date in the anthology. You could use the starting date of the conference as publication dates, which is, according to https://emnlp2018.org/, 2018-11-02.

Yes, ACL holds the copyrights for the papers. However, ACL licenses the papers under a CC-by license so you can use it when attributing.

drvenabili commented 5 years ago

Hi @mbollmann ,

The latest update (*SEM proceedings 2019) has some mistakes in properly parsing people's names, eg https://aclweb.org/anthology/volumes/proceedings-of-the-eighth-joint-conference-on-lexical-and-computational-semantics-sem-2019/ has "University of Amsterdam Ekaterina Shutova", and other mistakes.

Thank you for your work!

omidrohanian commented 5 years ago

On the author page, e.g. https://www.aclweb.org/anthology/people/y/yan-song/, we can only see the count for the top 5 most frequent venues (it used to be possible to view the count for all venues):

image

I totally agree with this. I prefer to see all the venues which is more informative.

rueycheng commented 5 years ago

Abstracts can be quite useful for readers to quickly skim through the entire proceedings when made accessible via mouseover texts (e.g. ACL '18 accepted paper page). Do you plan to provide this sort of functionality? Not all bibtex records contain this info though.

mjpost commented 5 years ago

@rueycheng where do you envision this information being added? On the conference index pages? Post an example link if you like.

rueycheng commented 5 years ago

@mjpost Thanks for asking. Yeah I think having this info in conference index pages would be ideal, e.g. https://aclweb.org/anthology/events/acl-2018/. Or in a separate page ("view abstracts") that can be accessed from conference index in case this might slow down page loading.

saamc commented 5 years ago

Search result list contains results as direct links to PDF documents (sometimes only those) as well as links to document pages. Document pages contain metadata and are much more useful, and getting to a document page from a PDF is cumbersome. https://www.aclweb.org/anthology/papers/P/P15/P15-2087/ (document metadata page) vs https://www.aclweb.org/anthology/P15-2087 (PDF document) Also the URL on the document metadata page does not refer to itself but the document. If a search finds a hit inside a document and not in the metadata could the corresponding document page be returned instead of the document itself? It is obviously easier to use the bibliographic information provided on the document metadata page. E.g. Zotero('s ACL importer) doesn't manage to resolve from the PDF back to the metadata page and after a direct document import tries to fetch bibliographic information from a generic catalog that often doesn't contain the full authoritative information available on ACL web.

mjpost commented 5 years ago

It's a bit of an undocumented feature, but you can get to the document page easily by appending a /:

saamc commented 5 years ago

Thanks, that's nifty indeed!

mbollmann commented 5 years ago

Abstracts can be quite useful for readers to quickly skim through the entire proceedings when made accessible via mouseover texts (e.g. ACL '18 accepted paper page). Do you plan to provide this sort of functionality?

@rueycheng That is a great idea IMO! I can look into that.

If a search finds a hit inside a document and not in the metadata could the corresponding document page be returned instead of the document itself?

@saamc We are using Google Custom Search at the moment, so we do not have the ability to implement this level of customization. The only option available is to restrict search to a specific subset of documents, which is why there's the "Paper Metadata" tab which explicitly searches in and returns only the document pages, not the PDFs. However, excluding the PDFs means no search results based on the PDFs either, so that's really only a compromise right now.

danielhers commented 5 years ago

Skærmbillede fra 2019-06-14 10-10-52 The drop-down for the sort criterion does not expand vertically to fit the contents on my screen.

Aspie96 commented 5 years ago

The URL of the front page of volumes is crazy long. It should be possible to link it (not the PDF directly, but the metadata also) without including such a long URL.

I understand having the title in the URL for SEO purposes, but a short URL should be provided.

mjpost commented 5 years ago

@Aspie96 Can you link to an example of what you mean?

Aspie96 commented 5 years ago

@mjpost https://aclweb.org/anthology/volumes/sem-2012-the-first-joint-conference-on-lexical-and-computational-semantics-volume-1-proceedings-of-the-main-conference-and-the-shared-task-and-volume-2-proceedings-of-the-sixth-international-workshop-on-semantic-evaluation-semeval-2012/

The URL of the PDF is: https://www.aclweb.org/anthology/S12-1

For a paper: https://aclweb.org/anthology/papers/S/S12/S12-1000/ (the title is the same, but the content not).

The PDF: https://www.aclweb.org/anthology/S12-1000

It would be much easier to reference the first link if it was https://aclweb.org/anthology/papers/S/S12/S12-1/.

reyha commented 5 years ago

All the earlier links like https://www.aclweb.org/anthology/W16-0404 are breaking.

akoehn commented 5 years ago

All the earlier links like https://www.aclweb.org/anthology/W16-0404 are breaking.

@reyha The URLs work just fine for me. Could you open a new issue detailing how exactly it breaks for you? (maybe try to reproduce it on a new browser / computer first)

nschneid commented 5 years ago

What about allowing people to list a personal website on their author pages? Would be another thing to maintain, but I guess it could be done with a Google form and a script.

mjpost commented 5 years ago

@nschneid I think maintaining dynamic data like this would be a huge pain. The form (like what I'm doing for attachments) would help but even that is time-consuming. Maybe if we get the Anthology tied to ACL Portal accounts we could do this.

knmnyn commented 5 years ago

+1 for getting such user information from the Portal or softconf instead.

Cheers,

Min

-- Min-Yen KAN (Dr) :: Associate Professor :: National University of Singapore :: NUS School of Computing, AS6 05-12, 13 Computing Drive Singapore 117417 :: +65 6516 1885(DID) :: +65 6779 4580 (Fax) :: kanmy@comp.nus.edu.sg (E) :: www.comp.nus.edu.sg/~kanmy (W)

On Sun, Jun 30, 2019 at 12:49 AM Matt Post notifications@github.com wrote:

@nschneid https://github.com/nschneid I think maintaining dynamic data like this would be a huge pain. The form (like what I'm doing for attachments) would help but even that is time-consuming. Maybe if we get the Anthology tied to ACL Portal accounts we could do this.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/acl-org/acl-anthology/issues/170?email_source=notifications&email_token=AABU727DWSWCMSVGZOSPWJTP46HC5A5CNFSM4G44Y4W2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY34JMA#issuecomment-506971312, or mute the thread https://github.com/notifications/unsubscribe-auth/AABU727X3JN4S4ITFGGZ5BLP46HC5ANCNFSM4G44Y4WQ .

akoehn commented 5 years ago

@nschneid, @mjpost there is already an ID system we could support instead: orcid. Let's manage such dynamic data in one place "industry"-wide instead of adding new identifiers. One can add links on their orcid side and that way changes only need to be managed in one place.

nschneid commented 5 years ago

ORCID sounds good to me. But would that require curation? Can it be linked from the ORCID end?

I just checked my START profile and, oddly enough, it doesn't have a website field.

mjpost commented 5 years ago

I don't think website is relevant to START. But the Portal (under a redesign planned by @desilinguist) could one day incorporate dynamic user information (website, START ID, anthology author page, ORCID).

knmnyn commented 5 years ago

Hi Arne, all:

I talked with Marti and the past TACL and CL leadership about ORCIDs and they were generally aboard on this matter, but I understood that MIT Press was going under some big overhaul that included a discussion about canonical Author IDs. You may want to check with both CL / TACL heads (Hwee Tou Ng and Ani Nenkova) on this. I think they were waiting for the Anthology to make a move first, as they had even more work on the operations side than I did (then in 2017-8).

Cheers,

Min

-- Min-Yen KAN (Dr) :: Associate Professor :: National University of Singapore :: NUS School of Computing, AS6 05-12, 13 Computing Drive Singapore 117417 :: +65 6516 1885(DID) :: +65 6779 4580 (Fax) :: kanmy@comp.nus.edu.sg (E) :: www.comp.nus.edu.sg/~kanmy (W)

On Sun, Jun 30, 2019 at 1:16 AM Nathan Schneider notifications@github.com wrote:

ORCID sounds good to me. But would that require curation? Can it be linked from the ORCID end?

I just checked my START profile and, oddly enough, it doesn't have a website field.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/acl-org/acl-anthology/issues/170?email_source=notifications&email_token=AABU7237J4SAVBTDMXY6DPLP46KFFA5CNFSM4G44Y4W2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY34XLA#issuecomment-506973100, or mute the thread https://github.com/notifications/unsubscribe-auth/AABU726WYNT4UODPSJSO4TLP46KFFANCNFSM4G44Y4WQ .

aryamccarthy commented 5 years ago

The Volume page Spoken Language Translation here does not exist. Same with the Tutorials volume. I couldn't concoct any other volume names to test out, but both pages appear on Google.

sudacn commented 5 years ago

good!

Runze-huang commented 5 years ago

I can't search for any papers?

akoehn commented 5 years ago

runze writes:

I can't search for any papers?

It uses google custom search. If google doesn’t work for you, the search won’t as well. If it is something different, a more detailed report would be helpful.

LuckyJLin commented 5 years ago

I can't read this PDF https://www.aclweb.org/anthology/W19-3604 There is nothing in the page.

akoehn commented 5 years ago

I can't read this PDF https://www.aclweb.org/anthology/W19-3604 There is nothing in the page.

There is a PDF, but the PDF is empty. Maybe an error in the ingestion process?

LuckyJLin commented 5 years ago

I can't read this PDF https://www.aclweb.org/anthology/W19-3604 There is nothing in the page. There is a PDF, but the PDF is empty. Maybe an error in the ingestion process?

I open this PDF from previous page https://www.aclweb.org/anthology/papers/W/W19/W19-3604/ but I still get an empty PDF.

mjpost commented 5 years ago

That is the PDF that the Widening NLP workshop gave us. There are many other empty (W19-3601 W19-3604 W19-3607 W19-3618 W19-3629 W19-3644 W19-3648) and improperly-formatted papers.

alphadl commented 5 years ago

Thanks, I am trying to find the WMT19's papers in the set, but I can not find them.

mjpost commented 5 years ago

They have not been ingested yet. Please see statmt.org/wmt19 where they are available.

jihunchoi commented 5 years ago

It seems that ACL 2019 tutorial abstracts have ACL 2017 publication information on their footer (see https://www.aclweb.org/anthology/P19-4#page=11 for example); is it intended?

mjpost commented 5 years ago

@jihunchoi Forgotten rsync, fixed, thank you!