acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
370 stars 252 forks source link

Adding CatalyzeX code finder integration to increase code coverage #3011

Open himanshuragtah1 opened 6 months ago

himanshuragtah1 commented 6 months ago

Added this issue here as instructed by @akoehn

-- I propose an integration with CatalyzeX that finds and links to code implementations for papers. This would be a great enhancement to ACL Anthology's current coverage of code.

We can open a pull request to your repo and can send you that shortly for review.

Here's what it would look like:

image

In case other sources have code, it can be shown in the dropdown as well.

mbollmann commented 6 months ago

Thanks for the suggestion! I'm still on vacation, so will check this out later. Can you add an example URL that this would link to? And maybe a brief explanation for someone unfamiliar with CatalyzeX what this provides that isn't covered by our existing Papers with Code integration yet?

himanshuragtah1 commented 5 months ago

Hope you had a wonderful vacation, and a great start to the new year! :)

Here is an example CatalyzeX url

image

that corresponds to this ACL paper: image

Regarding Papers with Code: In this context, although the functionality is similar — providing open-source implementations available for a paper —CatalyzeX has a larger, fast-growing collection of code implementations (approaching a million) that can be helpful to augment/complement what's currently surfaced for papers on ACL Anthology.

We similarly do so with live integrations on Arxiv and Openreview too.

We're continually crawling Github, Bitbucket, Gitlab, Sourceforge, and various personal/academic/professional webpages, and constantly getting code submissions via our website and popular browser extensions.

Hope this helps clarify, and please let us know if you have any questions. Looking forward to discussing next steps.

himanshuragtah1 commented 5 months ago

@mbollmann @akoehn — just following up here. Any next steps here or anything we can help with to move this forward? :)

mbollmann commented 4 months ago

@mjpost Do you have an opinion on this feature? I didn’t get around to taking a closer look at this yet, but @himanshuragtah1 says (via e-mail) that they can have a PR ready very quickly if we wanted to integrate this.

akoehn commented 3 months ago

There is one question I have: our pwc integration only has a link for code in case we actually do have code. I think that this is a good practice and we also do use this information in publication lists: grafik see the [|||] symbol.

I am not sure how we should handle two data sources here.

Regarding the type of the integration: would you plan to use the same kind of integration (i.e. sending pull requests to add the links) or do you want to add a general javascript widget on the pages?

[chatgpt please insert sorry for late reply boilerplate]

mjpost commented 3 months ago

Hi @himanshuragtah1—thanks for submitting the request, and I'm sorry that I've only now been able to look at this.

First, a few questions:

I'm open to this, but it would largely depend on how easy you could make the integration, since we are volunteer run. This includes:

himanshuragtah1 commented 1 month ago

Hi @mjpost — Sorry for the late reply. Thanks for taking the time to have a look at this code integration proposal :)

As suggested, we would actually prefer to have a JS widget that is capable of performing real-time requests to our own server for checking code availability, and then modifying the DOM accordingly from there. With this, we see a couple of advantages:

And of course, we will compactly handle both CX and PWC buttons, by introducing a dropdown like the one we shared in the screenshot above.


@akoehn — Regarding handling two data sources in the publication list: To keep it simple, we’ll just add another icon there. In cases where they don’t have code it will be just one code icon. The end user will benefit from having access to some code to work with and build upon.

image

Regarding the type of the integration: If possible, we would like to make as few changes as possible in your XML files and codebase in general. In our integration to other providers, like arxiv, we have our own javascript widget that fetches code information on the client side. This helps us always show up-to-date results, apart from simplifying the integration.

Let me know if all this sounds good, and we can open a PR shortly for your review. :)