ExPaNDS-eu / ExPaNDS

The main repository for the ExPaNDS project sponsored by the EU
5 stars 6 forks source link

Linking datasets to publication #45

Closed paulmillar closed 1 year ago

paulmillar commented 2 years ago

We have identified a use-case where a user who has used the pan-search-api to identified candidate datasets. After identifying these datasets, the user then would like to see the list of publications that reference (or are otherwise based on) each of those datasets.

We believe this use-case is not part of the ExPaNDS DoW and this use-case is not supported by the ExPaNDS architecture.

However, we believe this use-case has merit and we would like to investigate whether others (the PaNOSC people, in particular) would be interested in some kind of future collaboration that would address this use-case.

RKrahl commented 2 years ago

Note that it is far from trivial to find the publications that reference or use a dataset. If the publication properly cites the dataset in the references, there is a chance to track these citations. Unfortunately, this is not the case most of the time. Often, the data is only referenced somewhere in the text or the data availability statement without a proper citation.

Another issue is that raw data is often not used directly in a publication. In most cases, another dataset is derived from the raw data and the paper publication is based on that derived dataset.

We have project here at HZB trying to tackle this issue. I'll try to point the colleague to this issue.

paulmillar commented 2 years ago

You're right @RKrahl, this is non-trivial for several reasons.

Brian mentioned some existing work (Puma, if I remember the name correctly) that uses natural language processing to identify papers that are likely based on the dataset using the text from within the proposal description.

mkubin commented 2 years ago

Hi @paulmillar, as pointed out by @RKrahl, there is some interesting overlap of your use-case with a project we are currently working on in HMC Hub Matter, where we identify datasets published by Helmholtz researchers. I think it could be fruitful to meet and exchange some ideas on the topic. Please feel free to contact us. All best!

twdragon commented 2 years ago

It would be useful here to create an automated multilayered graph database connecting PIDs of datasets and publications. @RKrahl @zjttoefs @agbeltran have we the existing tools to build it?

servansod commented 1 year ago

Noting the idea and contacts for a possible OSCARS open call project or HMC open call. Closing this issue with ExPaNDS perspective though.