gbif / portal16

GBIF.org website
https://www.gbif.org
Apache License 2.0
24 stars 15 forks source link

Add Altimetric donut to pages with DOIs #245

Open rdmpage opened 7 years ago

rdmpage commented 7 years ago

https://www.altmetric.com provides a nice tool to display the online attention an object with a DOI is receiving, e.g. on Twitter, Facebook, blogs, Mendeley, Wikipedia, etc. There's a simple API to add a badge to a page https://www.altmetric.com/products/altmetric-badges/ This would be a simple way to demonstrate to users that datasets on GBIF were getting attention - if only from GBIFs own Twitter stream ;) It would be a simple way to help link GBIF data with the outreach activities of @kcopas @dnoesgaard (i.e., it would automatically pick up tweets/blogs, etc. that cited the DOI).

Eventually could also do this for any publication listed on a GBIF page, but initially focussing on GBIF hosted data seems the most sensible.

kbraak commented 7 years ago

Related to this issue, please also refer to the following Jira that I created last year: Provide publishers with list of papers citing data from their datasets. This Jira provides background information about which data usage statistics users value most and lists several other tools we could use for gathering them.

MortenHofft commented 7 years ago

Sounds reasonable to add the meta tags so they can track it. We can then also better evaluate if and how prominently we want to show it.

It isn't entirely clear to me how those meta tags are used. I have written and asked them. Would it be an issue to use the same identifiers on the UAT environment fx? Tabs? I want mentions of any of these sources to count in the same bucket I would guess. The DOI would however not point to that URL, so they might prefer an additional tag for those pages.

I wonder if there is a way to aggregate the result with download DOIs - they should ideally contribute to the dataset metric.

rdmpage commented 7 years ago

@MortenHofft Not sure what you mean by "meta tags"? Altimetric tracks DOIs (as well as PMID, Handles, Wikipedia and a few other things). So long as someone mentions a DOI then that will contribute to the altimetric score. Adding badges is trivial, but there is also an API so one could aggregate scores they provide together with GBIF download stats as par to an overall measure of attention/impact.

dnoesgaard commented 7 years ago

I had a chat with Altmetrics on this a while back. Maybe this can be helpful:

https://twitter.com/dnnyboy/status/782859445835538432

rdmpage commented 7 years ago

@dnoesgaard Ah OK, so I guess the look for a tag like

<meta name="DC.identifier" content="10.3406/linly.1957.7927" scheme="DOI">

which is what you see on most scientific article webpages and/or tags based on Google Scholar's vocabulary, e.g.:

<meta name="citation_doi" content="10.3406/linly.1957.7927">

This would seem a simple fix...

MortenHofft commented 7 years ago

@rdmpage they might be able to track in some instances without, but they recommend applying metatags

But it isn't clear how they then handle duplicated pages (say UAT). Normally you should designate a canonical when having duplicate content, but it isn't clear if they need that. Similar it isn't clear to me what to do when an item is described by multiple urls (say tabs).

Ex: Twitter feeds about dataset/123 and dataset/123/more-info should both count to the same metrics if mentioned somewhere. (I'm assuming they attempt to identify on other things than DOIs since the ask for tags with title, author etc)

But the DOI resolves to dataset/123 . should we then add the same metatags to dataset/123/more-info or is that considered rank manipulation? Will it count or be discarded since the DOI and the URL are different. Count less? Should it be marked with a to me unknown tag describing the relationship?

It is trivial to add all of this, but I want to know how to get the best tracking results.

The API aggregation is interesting!

rdmpage commented 7 years ago

@MortenHofft I think two different things are being conflated here. My use case is Altmetric tracking mentions of GBIF DOIs. Altimetric tracks mentions of DOIs, so if I tweet about a GBIF dataset and use the DOI, then that counts as one mention. Based on @dnoesgaard discussion it appears that Altmetric will resolve the DOI, check that there's a meta tag with a DOI on that page, if so the tweet gets counted. You could have the DOI in the meta data of multiple pages (e.g., hash ids or "/xxx") that relate to the main dataset page on GBIF (i.e., treat the DOI as the canonical identifier) but that's fine as Altimetric isn't counting how many times GBIF has that DOI on it's pages, simply that the DOI points to a valid resource. This is also a great reason why we should encourage people to cite DOIs in Twitter, on blogs, etc., they avoid all the hassle of multiple URLs for the same thing.

MortenHofft commented 7 years ago

I wont rule out that I'm misunderstanding how this works completely, both of you have spend way more time on this than I have.

As I have understood things it works like this:

They track using:

via a direct link or unique identifier such as a DOI

Our scraper evaluates the text in the news story or policy document and determines if it has the appropriate data to make a positive match to the research output. The data we are looking for is author names, journal title and a timeframe.

For this to work we need to have meta tags with Title etc aside from the DOI meta tag.


Sounds wonderful. But what happens then when the same item exists multiple places (preprints, my own site, sub-urls).

Ideally mentions of preprint/paper/123 and paper/123/main and paper/123/more-info counts to the same DOI. and they might if they have the same DOI in the meta tag. But if different urls can count to the same DOI, that opens the door for fraud. I can now add the meta tags to my popular blog about aliens building pyramids and the mentions will count as my scholarly work on that DOI.

alternatively only one url (the doi) is counted, but then having multiple urls for a resource is a bad idea that we should avoid as there is a risk of diluting counts. Or perhaps they have a way to handle all of that. I just haven't seen any guidelines addressing it.

Have I completely misunderstood how it works?

Either way – they have answered that multiple URLs is fine by them – they didn’t address fraud and whether the other URLs would count or simply be discarded. Since I'm the only one who sees potential issues here, I will simply add it to all dataset subpages.

Thanks

rdmpage commented 7 years ago

@MortenHofft It's not clear to me what you mean by "fraud". If tweets, etc. mention the DOI then that DOI will resolve to the definitive version of that resource (paper, dataset, etc.). That's pretty much the point of DOIs, they enable you to be assured that you are looking at the genuine content (yet another reason why GBIF publicity by @kcopas and @dnoesgaard about datasets should always use the DOI, not a GBIF link. My understanding is that Altimetric count DOIs and other identifiers, so they can be confident point to genuine content (e.g., Handles, PubMed ids). There's likely little incentive to create "fake" articles or data because if people use the DOI they will automatically avoid those sites.

The only incentive I can see to game the system is to increase attention counts, e.g. tweets or blog posts linking to the content. In a sense we already do this, many publishers have twitter streams that advertise the latest articles, these will count towards the score. But again, what's the incentive to game the system? And if people did, I suspect that - like Google - Altemtric would develop filters to weed out the spammers, otherwise their reputation suffers and their business model (seeing attention metric data to publishers and universities) is dead.

MortenHofft commented 7 years ago

@rdmpage It seems I have met the limits of what I'm capable of explaining in english. No matter, it is a detail anyhow that we can deal with should problems arise.

dnoesgaard commented 7 years ago

I feel like we covered a lot in our talk, @MortenHofft, (sorry @rdmpage) and even though I agree with some of your concerns on a theoretical level, I also think we can be pragmatic about this. So far less than 20 GBIF-minted DOIs have Altmetric scores >0. But I'm still intrigued as to how implementing the metadata will affect this.

MortenHofft commented 7 years ago

I have added meta tags https://github.com/gbif/portal16/commit/4bd6f85923474162368c83a1d39e8b95d8dbf169 to main page only - these should be aligned with the citation once it becomes part of the new dataset api reponse. I followed the format for publisher, title and doi described on their site that also aligned with the example they sent me on mail.

MortenHofft commented 6 years ago

@rod and @dnoesgaard you used to argue strongly that we should integrate tightly we Altmetrics. Is this still the case? If so, then I will see what I can easily do (probably asking you Daniel). If not I will close issues related to it.

dnoesgaard commented 6 years ago

The total number of datasets with Altmetric scores is still very low (53) - but I guess it has grown since last time I checked.

I'll continue to monitor how this develops, but for now I'm ok with parking this.

dnoesgaard commented 6 years ago

Fwiw, Altmetric does pick up on Twitter mentions now, e.g. https://www.altmetric.com/details/18924739/twitter