cernopendata / opendata.cern.ch

Source code for the CERN Open Data portal
http://opendata.cern.ch/
GNU General Public License v2.0
666 stars 148 forks source link

Automatic citation inspire link for open data records #3659

Closed zlmarshall closed 1 month ago

zlmarshall commented 4 months ago

Hi there!

I wonder if it might be possible to add to all records an automatically generated link to the inspire search for papers referencing them? I remember seeing this in one of the CMS Open Data talks:

https://inspirehep.net/literature?sort=mostrecent&size=25&page=1&q=references.reference.dois%3A10.7483%2FOPENDATA.CMS%2A

And thinking it's a fantastic thing. My notion is that for example on the DOI line of a record could be a link to "Citations"; e.g. on

https://opendata.cern.ch/record/15012

a link to:

https://inspirehep.net/literature?sort=mostrecent&size=25&page=1&q=references.reference.dois%3A10.7483%2FOPENDATA.ATLAS.UXKX.TXBN

Thanks, Zach

tiborsimko commented 3 months ago

Thanks for the great suggestion!

One concern that comes to mind is that many datasets may not get cited directly, so offering a "static" link may lead to many "zero citation" pages after a user clicks on those, which may lead to some frustrating user experience. For example, we have 57k simulated datasets overall, and I think most of them will have zero direct citations, since they may be of particular niche interest only.

I may therefore be advantageous to offer the citation link only when we know for sure that the data was cited by a paper. We could do this if we periodically synchronise the citation counts from INSPIRE, for example by a background daily sync process, and store this information also on the Open Data portal side. We would then be able to offer "dynamic" links targeted to each record, such as "Cited by 4 papers", "Cited by 17 papers", etc. Additionally, knowing the citation counts could also be useful in order to offer some additional search interface capabilities, such as "sort by most cited". (Or to display some aggregated citation information by experiment, for example in the "About ATLAS" pages.)

This dynamic way of linking would be probably better than offering static citation links to INSPIRE for the users, but it would come at the price of some technical development, syncing complexity and maintenance. We'll try to think it over in the team and come back, perhaps with a few mock-ups?

psaiz commented 2 months ago

Hi Zach, Thanks for this suggestion. I'm linking a pull request that will add the link only for those datasets that have something pointing to them.

Let me know what you think about this approach.

Best, pablo

zlmarshall commented 2 months ago

Hi @psaiz ,

Thanks! You'll notice one addition in my version above: the search is both on doi and page URL. We've found that some folks directly cite the webpage instead of using the URL (even with the DOI clearly visible on the page). Would that be easy for you to add?

An irresistible part of me wants to point out that it should be "referring to these data" :)

But generally: this is certainly better than nothing, so please feel free to go ahead even without those ideas implemented. I like forward progress!

Cheers, Zach

psaiz commented 2 months ago

Hi again, Thanks for the quick reply @zlmarshall. I've changed both things ('these data' and the search based also on the webpage).

I could put this on the opendata-qa to see how it looks there.

Cheers, pablo

zlmarshall commented 2 months ago

That sounds great to me. Thanks again!

tiborsimko commented 2 months ago

Hi Pablo,

Seeing the issue closed, some comments on your implementation:

Additionally, knowing the citation counts could also be useful in order to offer some additional search interface capabilities, such as "sort by most cited". (Or to display some aggregated citation information by experiment, for example in the "About ATLAS" pages.)

psaiz commented 2 months ago

Hi Tibor and Zach,

Sorry that the ticket got closed automatically when the pull request got merged (even if it was not deployed on any of our instances. I'll reopen the ticket until it gets deployed in production.

Tibor, I would like to focus here on how this is presented to the user. The implementation (either storing on our side, or querying life) can be discussed on a different place.

After double-checking with our inspire colleagues, I've deployed a prototype on opendata-qa. We can see this on entries like https://opendata-qa.cern.ch/record/80030 or https://opendata-qa.cern.ch/record/15012

Screenshot 2024-09-02 at 10 54 56

The first one already showed a small issue (in case there is a single reference, the message should be slightly different). I'll fix that.

Please, let me know if there are any other comments

zlmarshall commented 2 months ago

Very nice! I think this is what I was after. One request: could you remove the "size=1" from the parameters in the auto-generated link? For the 6-publication entry, you'll see that it then generates an inspire search with one record per page, which is a bit awkward. Otherwise this looks great, thanks!

psaiz commented 1 month ago

Thanks for the comments, @zlmarshall. This has been deployed in production, removing the size=1 from the link.

I'm going to be optimistic and close this ticket. Please, do reopen it (or create a new one) if there are any issues.