Closed Melissa37 closed 2 years ago
I believe you have the wrong graham :)
Sorry! @graham Putting the right name on here now :-)
@gnott
Actions:
[x] Update the PubMed code @gnott dealing with on this ticket: https://github.com/elifesciences/elife-pubmed-feed/issues/78
[ ] Write ticket on Exeter Board asking them to change the UI of Kriya
no longer add assigning authority
[ ] Simplify Gitbook
no longer add assigning authority
remove archive as pub-id-type so only DOI and Accession. If not DOI, use Accession
[ ] Update Schematron to reflect the changes
I'm looking at the PubMed deposit logic more closely now, and I see the assigning-authority
value is actually used right now. I can change it so it is not required though. It also only applies to the uri
matching method of classifying the dataset, and the proposed logic of using the source
value will avoid that too.
So we are not stuck if you remove the assigning-authority
attribute, I have changed the code in the PubMed generation library in PR https://github.com/elifesciences/elife-pubmed-xml-generation/pull/39 that only looks at the uri
value alone. The tests are passing and I think it will be safe to deploy this until matching the <source>
value is added in.
Fantastic, thank you!
@FAtherden-eLife can we implement the rules about source and schematron without removing the assigning authority from Kriya? M
@FAtherden-eLife based on https://github.com/libero/editor/issues/101
When we talk to Exeter about redesigning Dataset references we should have:
<element-citation>
With attributes:
@publication-type="data"
@specific-use= “analyzed”
OR @specific-use= “generated”
<person-group>
@person-group-type="author"
<name><surname><given-names>
AND/OR <collab>
<data-title>
<year iso-8601-date="XXXX">
<source>Repository Name</source>
<pub-id
OR <ext-link>
@pub-id-type="doi"
OR @pub-id-type="accession"
<version designator="XXX">
Is this right?
@Melissa37, yup looks good.
@specific-use= “analyzed”
OR@specific-use= “generated”
In the Editor ticket we have @specific-use= “supporting”
instead of @specific-use= “analyzed”
.
I think analyzed is better though (going from the JATS4R recs), so I'll update it there.
Fab, thanks. Sorry that's my fault. If yo agree that I've used the right term we're all good :-)
On Kriya 2 Gitlab board: https://gitlab.com/ExeterPremedia/reqs/cust/elife-kriyadocs/-/issues/20
Not available as a field in Kriya 2. Need to confirm that it's not in the XML.
As you reminded me James, we are using the @assigining-authority as a method for Graham to parse out the allowed Type for datasets sent to PubMed. According to the new JATS4R data citation recommendation this is wrong.
I was looking into this and was concerned that for GDB and PDB we would not be able to use the DOI prefix to identify the database because they are the same prefix (otherwise we could create a DOI look up).
Then I thought, we are still over complicating this! The
If we converted the schematron messages that check the assigning authority to check the source is labelled correctly, could this fix the issue?
My second eureka thought was that they get our reference list now from PMC and publish that, so once we move our dataset references into the reference lists this problem goes away anyway.
What do you guys think?
I'd rather be JATS4R compliant and simplify our XML/process when we know they'll be getting the dataset references in due course anyway.
Hi Melissa,
In response to some discussion on our call today, I'll add in here what I know if it helps.
PubMed generation right now concerning datasets looks at the URI for a "hint" of which type to use. We only have four NCBI types configured (from https://github.com/elifesciences/elife-pubmed-xml-generation/blob/develop/elifepubmed/generate.py#L18-L25), and the code I hope is not too confusing to see: ASSIGNING_AUTHORITY_MAP = { 'NCBI': [ ('www.ncbi.nlm.nih.gov/geo', 'NCBI:geo'), ('www.ncbi.nlm.nih.gov/projects/gap', 'NCBI:dbgap'), ('www.ncbi.nlm.nih.gov/nuccore', 'NCBI:nucleotide'), ('www.ncbi.nlm.nih.gov/sra', 'NCBI:sra') ] }
The "hint" logic could probably be extended to look at the source value to determine a Pubmed type value. So far, there's not a very rigidly defined set of values to use, and if we guess wrong, I think PubMed would just ignore the object tag if the type is not acceptable to them.
For how PMC data gets copied over, I will leave that to your more expert opinion, I don't know what they do.
Thanks!
G
Thanks Graham
Here is the current list of Dataset values in PubMed's list:
Allowed Type attribute values for databanks (sample XML is available here):
BioProject Dryad figshare GDB Omim PDB PIR SwissProt UniMES UniParc UniProtKB UniRef NCBI:dbgap NCBI:dbvar NCBI:genbank NCBI:genome NCBI:gensat NCBI:geo NCBI:homologene NCBI:nucleotide NCBI:popset NCBI:protein NCBI:pubchem-bioassay NCBI:pubchem-compound NCBI:pubchem-substance NCBI:refseq NCBI:snp NCBI:sra NCBI:structure NCBI:taxonomy NCBI:unigene NCBI:unists
@Frederick Atherden would you be able to look in our archive to see whether we've used all of these databases in our datasets? It would be good to see what the URLs are, so Graham could potentially extend that logic from what he has if we can.
It looks like Graham is not using the @assigning-authority so we can remove it asap. @Elife Production can you confirm you are OK with this and I'll get the wheels moving on that with Exeter.
Regarding @pub-id-type values I think we can simplify this for our content to "if there is a DOI use that, if not use accession". Some may go slightly wrong, but the level of complexity and effort we're putting into this does not really have any impact so why are we making it so complex?!
Adding: The source for these options could be controlled via Schematron and then a mapping provided to Graham for PubMed conversions. See GitHub ticket: https://github.com/elifesciences/elife-pubmed-feed/issues/78
@Graham Nott PMC just use what we provide, no fancy conversions.