Add the metadata schema on dataset page to allow Hypothesis to parse citation DOI

only1chunts commented 7 years ago

User Story

As a researcher I want the Hypothesis annotation tool to recognise a GigaDB dataset page url as an alias of its DOI So that I can correctly cite it irrespective of which website it appears on or I annotate from

Acceptance Criteria

Given I am not logged in to Gigadb web site When I go to the primary dataset url "/dataset/100002" And I view the page source Then I can see a meta-tag for a citation DOI predicate with value "10.5524/100002"

Given I am not logged in to Gigadb web site When I go to the url "/dataset/view/id/100002/File_sort/date_stamp" And I view the page source Then I can see a meta-tag for a citation DOI predicate with value "10.5524/100002"

Additional Info

This could be done in conjunction with #73 as its a related concept.

Following an email discussion with Jon Udell at Hypothes.is we should include an additional flag in the HTML code of dataset pages to inform machines about the connection between the webpage and the DOI.

Hi Chris, Here is the short answer: I think pages like http://gigadb.org/dataset/100132 should include statements like:

How we use that metadata to alias documents is a mechanism that has changed, and will likely change again. But informing Hypothesis about the DOI of each GigaDB page is a good way to futureproof your namespace. Regards, Jon

For example see the source code for this article: view-source:http://www.jneurosci.org/content/34/17/6112 among all that extra metadata included in it, is this line:

Product Backlog Item Ready Checklist

[ ] Business value is clearly articulated
[ ] Item is understood enough by the IT team so it can make an informed decision as to whether it can complete this item
[ ] Dependencies are identified and no external dependencies would block this item from being completed
[ ] At the time of the scheduled sprint, the IT team has the appropriate composition to complete this item
[ ] This item is estimated and small enough to comfortably be completed in one sprint
[ ] Acceptance criteria are clear and testable
[ ] Performance criteria, if any, are defined and testable
[ ] The Scrum team understands how to demonstrate this item at the sprint review

Product Backlog Item Done Checklist

[ ] Code is complete
[ ] Automated tests related to the changes are implemented and passing
[ ] All automated test suites are passing locally
[ ] Code is refactored to best practices and coding standards
[ ] Documentation is updated as needed
[ ] A Pull Request has been created and review requested
[ ] Pull Request is reviewed and approved
[ ] The item has been merged to the develop branch
[ ] All automated test suites are passing on continuous Integration pipeline and item is ready to release

only1chunts commented 4 years ago

I cant find the email from Jon Udell, but there is a hypothesis help page describing the issue: https://web.hypothes.is/help/how-hypothesis-interacts-with-document-metadata/

only1chunts commented 4 years ago

I just found this comment from @jessesiu in my emails:

The example from Jon Udell http://www.jneurosci.org/content/34/17/6112, if you view the source code in the , it seems they used the Dublin Core schema, OGP Schema (enables any web page to become a rich object in a social graph) and add some extra information e.g. citations.

From the hypothesis suggestion page https://web.hypothes.is/help/how-hypothesis-interacts-with-document-metadata/, it mentions the rel attribute for the page interacts with other pages and dc.identifier DOI. We can discuss which kind of information we need to add, do we use the DC schema or any others, Can we add OGP, Twitter Cards and schema.org (https://developers.google.com/web/fundamentals/discovery/social-discovery) to allow social discovery.

Thanks, Jesse

pli888 commented 4 years ago

@ScottBGI says:

I think the social media discovery (OGP and twitter) is a nice optional add on if its easy to integrate, but the priority are the structured schemas for discovery/ Talking to Carole last week it seems bioschemas is progressing, and Chris Gorgolewski from OpenNeuro/Stanford has just joined google to work on it and should push it forward very fast. So my preference would be to prioritise their schema for datasets and data repositories:

https://bioschemas.org/specifications/

You guys should decide what you think is the best way to go though.

ScottBGI commented 4 years ago

Google have upgraded their Schema.org structured data test tool to a broader rich results test:

https://search.google.com/test/rich-results

Looking at GigaDB pages it doesn't recognise the front page, and for the individual entries it recognises these are datasets but has picked up a few errors. e.g. for the machado entry:

https://search.google.com/test/rich-results?id=_CCQ16KZhg63sg6AeNOTQQ

It says 'Not all markup is eligible for rich results" and flags one error and two warnings:

Invalid object type for field 'license' Invalid value type for field 'license' (optional)

And under license/distribution it says:

Missing field 'encodingFormat' (optional)

I don't know if this is an easy thing to fix/improve, but thought I'd flag it. They've got a nice guide on datasets here:

https://developers.google.com/search/docs/data-types/dataset

Cheers,

Scott

only1chunts commented 3 years ago

The change of ticket title is in response to the suggestion from @rija to tightening the scope of this ticket to be just the addition of metadata tags to allow Hypothesis to correctly track comments within pages and sub-pages.

rija commented 3 years ago

User story

Here's a user story:

As a researcher
I want the Hypothesis annotation tool to recognise a GigaDB dataset page url as an alias of its DOI
So that I can correctly cite it irrespective of which website it appears on or I annotate from

Approach

@kencho51,

There are two ways to implement this, using Highwire Press Tags or using the Dublin Core Metadata. The former, Highwire Press Tags, is preferred for this particular task for two reasons:

The metadata tag's predicate is more precise and always represents a DOI (Dublin Core uses its generic identifier tag which are not always DOI)
Google Scholar indexing engine automatically recognises Highwire Press Tags

The Hypothesis help page linked in previous comment details the syntax to use for Highwire Press Tags and has this example:

<meta name="citation_doi" content="10.1016/j.ajhg.2017.02.007">

How to test, how do we know we're done:

features/dataset-metadata-citation-doi.feature:

Feature: Add the metadata schema on dataset page to allow Hypothesis to parse citation DOI
As a researcher
I want the Hypothesis annotation tool to recognise a GigaDB dataset page url as an alias of its DOI
So that I can correctly cite it irrespective of which website it appears on or I annotate from

Given I am not logged in to Gigadb web site
When I go to "/dataset/100002"
And I view the page source
Then I can see a meta-tag for a citation DOI predicate with value "10.5524/100002"

(please check you're happy with this acceptance test @only1chunts)

(it's provisional, it may vary slightly to cater for implementation constraints or formatting)

rija commented 2 years ago

Closing as it's working on staging.gigadb.org

only1chunts commented 1 year ago

This work does not appear to have been deployed to beta.gigadb.org ? please check

gigascience / gigadb-website