A data consumer looking for datasets wants to limit their search to reputable scientists whose work they trust.
A data consumer with a question about a dataset wants to know who to contact.
A data consumer looking for a specific dataset wants to be able to find it by its 'colloquial name', following the pattern (first author, year) e.g. 'Azizi 2018'
1 requires a full author list. In practice, scientists usually limit their search to first/co-first and senior/co-senior authors.
2 is either the corresponding author(s) or the "contributor", who could be a curator. Creating clear UX around who should be contacted while respecting data ownership could be challenging.
There is a contact name in the collection for this purpose.
3 requires that first authors, last authors, and publication year be searchable.
Note, for sites that allow readers to download a citation such as Stress-induced RNA–chromatin interactions promote endothelial dysfunction, the RIS format defines an ID tag documented as the Reference ID for the publication which is the colloquial name described above; however, there is consensus to use a summary citation format instead:
Last name of first author (Publication Year) Journal abbrevation such as Ren et al. (2021) Cell.
For framework developers querying the cellxgene Portal API for a Collection DOI to pass as a parameter to services which return publication metadata including authors and publication date, there are currently some minor issues with the DOI values that require a bit more parsing:
'https://www.doi.org/10.1126/science.aba7721' # www.doi.org instead of doi.org
' https://doi.org/10.1101/2020.03.31.016972' # leading space not stripped
It's also not helpful that the portal requires and stores the full URL because both scheme and domain (https://doi.org) must be stripped before the DOI can be passed to such services:
Feb 22 2022: There was agreement to continue with the current modeling of the DOI as a URL for consistency with the other links. We can revisit whether we want to return a DOI curie (in a separate section of the response) in a future API update. The new code also guarantees that the scheme and domain are https://doi.org.
Changes to the Create Collection UX
The Create Collection UX must be updated to:
Replace the DOI link with Publication DOI to clarify our intentions. See the related thread on single-cell-data-wrangling.
Only allow onePublication DOI link to be added to the collection
curie := [ [ prefix ] ':' ] reference
The UX prompts with a read-only 'doi:' prefix and separator. The curator adds the reference. For example, '10.1016/j.cell.2021.01.053'.
When a curator adds a Publication DOI link to a collection, publication metadata is acquired by issuing a Crossref query for the DOI and then parsing the successful JSON response (or see XML response):
Changes to Edit Details UX for both private collections and private revisions of public collections
Edit Details UX MUST be updated to:
Add a Publication DOI using the requirements (only one DOI per collection) and process described in Changes to the Create Collection UX
Update a Publication DOI using the process described in Changes to the Create Collection UX
Delete a Publication DOI and its related publication metadata
Note: The portal needs a policy for Crossref failures which may be due to pending publications.
Required publication metadata
The following metadata is REQUIRED when a DOI is available:
Authors [in order]
Publication month, day, and year
Publication journal [abbreviated is preferred]
author
The ordered list of authors must be stored in the database. It should be simple for a portal query to subsequently extract the primary author's last name for use in a citation format.
but sometimes authors also include consortia modeled as name:
{
"name": "the HPAP Consortium",
"sequence": "additional",
"affiliation": []
}
If name is the first author, ensure that it’s captured for use in the summary citation.
name(s) will not be included in the author filter, only individual scientists.
# assuming a successful https request - response is the request.json()
message = response['message']
# the ordered list of authors
authors = message['author']
# primary author's last name
display(message['author'][0]['family'])
preprint
There is some conditional behavior that is dependent on whether the DOI is a preprint or a journal publication.
# is this a preprint?
is_preprint = message['subtype'] == "preprint"
published
The publication month, day, and year must be stored.
In order of preference when multiple are included in the response:
Contains an ordered array of year, month, day of month. Note that the field contains a nested array, e.g. [ [ 2006, 5, 19 ] ] to conform to citeproc JSON dates
If publication month is unavailable, default to "1".
If publication day is unavailable, default to "1".
# publication dates
published_date = {}
if 'published-print' in message :
published_date = message['published-print']
elif 'published' in message:
published_date = message['published']
elif 'published-online' in message:
published_date = message['published-online']
# month of publication
month = published_date['date-parts'][0][1]
# year of publication
year = published_date['date-parts'][0][0]
container
In order of preference when multiple non-empty values are included in the response:
short-container-title
container-title
institution [we should validate whether preprints only have this value set]
journal = ""
# noticed empty values for containers
if 'short-container-title' in message and message['short-container-title']:
journal = message['short-container-title'][0]
elif 'container-title' in message and message['container-title']:
journal = message['container-title'][0]
elif 'institution' in message:
journal = message['institution'][0]['name']
display(journal)
Changes to the A Collection UX
The publication metadata is used to create a summary citation.
The Publication field in A Collection replaces the previous DOI field. Its value is an anchor element that is composed of the DOI href with the summary citation as the human-readable name:
<a href="https://doi.org/111.222">Ren et al. (2021) Cell</a>
Changes to the Dataset Drawer UX
Similar to A Collection, the dataset drawer needs to be refreshed for DOI and Publication changes.
if is_preprint:
try:
published_doi = message['relation']['is-preprint-of']
# the new DOI to query for ...
if published_doi[0]['id-type'] == 'doi' :
display(published_doi[0]['id'])
except KeyError:
pass
This would allow the portal to refresh preprint DOI(s) with their published DOI(s) on a regular cadence.
September 2 Update: Stanford discovered a case where the publishers failed to update the relationship between a preprint DOI and its publication DOI.
Stories
Based on recent UX Research - Be Confident About Dataset Quality, Ambrose observed in a conversation on single-cell-data-wrangling:
There is a contact name in the collection for this purpose.
Note, for sites that allow readers to download a citation such as Stress-induced RNA–chromatin interactions promote endothelial dysfunction, the RIS format defines an
ID
tag documented as the Reference ID for the publication which is the colloquial name described above; however, there is consensus to use a summary citation format instead:Last name of first author (Publication Year) Journal abbrevation such as Ren et al. (2021) Cell.
UX Design
Create Collection Publication DOI link A Collection Publication Dataset Drawer
Product Design
For framework developers querying the cellxgene Portal API for a Collection
DOI
to pass as a parameter to services which return publication metadata including authors and publication date, there are currently some minor issues with theDOI
values that require a bit more parsing:It's also not helpful that the portal requires and stores the full URL because both scheme and domain (
https://doi.org
) must be stripped before theDOI
can be passed to such services:Feb 22 2022: There was agreement to continue with the current modeling of the DOI as a URL for consistency with the other
links
. We can revisit whether we want to return a DOI curie (in a separate section of the response) in a future API update. The new code also guarantees that the scheme and domain arehttps://doi.org
.Changes to the Create Collection UX
The Create Collection UX must be updated to:
DOI
link withPublication DOI
to clarify our intentions. See the related thread on single-cell-data-wrangling.Publication DOI
link to be added to the collectioncurie := [ [ prefix ] ':' ] reference
The UX prompts with a read-only 'doi:'
prefix
and separator. The curator adds thereference
. For example, '10.1016/j.cell.2021.01.053'.When a curator adds a
Publication DOI link
to a collection, publication metadata is acquired by issuing a Crossref query for theDOI
and then parsing the successful JSON response (or see XML response):Changes to Edit Details UX for both private collections and private revisions of public collections
Edit Details UX MUST be updated to:
Note: The portal needs a policy for Crossref failures which may be due to pending publications.
Required publication metadata
The following metadata is REQUIRED when a DOI is available:
author
The ordered list of authors must be stored in the database. It should be simple for a portal query to subsequently extract the primary author's last name for use in a citation format.
Feb 8 2022 Update see #single-cell-filter-by-metadata
In most cases, authors are individual scientists modeled as
given, family
in crossref:but sometimes authors also include consortia modeled as
name
:name
is the first author, ensure that it’s captured for use in the summary citation.name
(s) will not be included in the author filter, only individual scientists.preprint
There is some conditional behavior that is dependent on whether the DOI is a preprint or a journal publication.
published
The publication month, day, and year must be stored.
In order of preference when multiple are included in the response:
published-print
published
published-online
From Date
year
,month
,day of month
. Note that the field contains a nested array, e.g.[ [ 2006, 5, 19 ] ]
to conform to citeproc JSON datesFeb 8 2022 Update see #single-cell-filter-by-metadata
container
In order of preference when multiple non-empty values are included in the response:
short-container-title
container-title
institution
[we should validate whether preprints only have this value set]Changes to the A Collection UX
The publication metadata is used to create a summary citation.
The
Publication
field in A Collection replaces the previousDOI
field. Its value is an anchor element that is composed of the DOI href with the summary citation as the human-readable name:<a href="https://doi.org/111.222">Ren et al. (2021) Cell</a>
Changes to the Dataset Drawer UX
Similar to A Collection, the dataset drawer needs to be refreshed for
DOI
andPublication
changes.further metadata (placeholder)
Automatically updating a preprint DOI
Update the DOI in the A single-cell transcriptional roadmap of the mouse and human lymph node lymphatic vasculature collection is an example of an updated DOI that was discovered during prototyping. If an existing preprint DOI is queried again AND it has been published since the previous query, then Crossref returns the published DOI in:
This would allow the portal to refresh preprint DOI(s) with their published DOI(s) on a regular cadence.
September 2 Update: Stanford discovered a case where the publishers failed to update the relationship between a preprint DOI and its publication DOI.