Remove @assiging-authority from Dataset references

Melissa37 commented 4 years ago

As you reminded me James, we are using the @assigining-authority as a method for Graham to parse out the allowed Type for datasets sent to PubMed. According to the new JATS4R data citation recommendation this is wrong.

I was looking into this and was concerned that for GDB and PDB we would not be able to use the DOI prefix to identify the database because they are the same prefix (otherwise we could create a DOI look up).

Then I thought, we are still over complicating this! The in our XML should be enough to create the values for Graham.

        <source>NCBI Gene Expression Omnibus</source>

If we converted the schematron messages that check the assigning authority to check the source is labelled correctly, could this fix the issue?

My second eureka thought was that they get our reference list now from PMC and publish that, so once we move our dataset references into the reference lists this problem goes away anyway.

What do you guys think?

I'd rather be JATS4R compliant and simplify our XML/process when we know they'll be getting the dataset references in due course anyway.

Hi Melissa,

In response to some discussion on our call today, I'll add in here what I know if it helps.

PubMed generation right now concerning datasets looks at the URI for a "hint" of which type to use. We only have four NCBI types configured (from https://github.com/elifesciences/elife-pubmed-xml-generation/blob/develop/elifepubmed/generate.py#L18-L25), and the code I hope is not too confusing to see: ASSIGNING_AUTHORITY_MAP = { 'NCBI': [ ('www.ncbi.nlm.nih.gov/geo', 'NCBI:geo'), ('www.ncbi.nlm.nih.gov/projects/gap', 'NCBI:dbgap'), ('www.ncbi.nlm.nih.gov/nuccore', 'NCBI:nucleotide'), ('www.ncbi.nlm.nih.gov/sra', 'NCBI:sra') ] }

The "hint" logic could probably be extended to look at the source value to determine a Pubmed type value. So far, there's not a very rigidly defined set of values to use, and if we guess wrong, I think PubMed would just ignore the object tag if the type is not acceptable to them.

For how PMC data gets copied over, I will leave that to your more expert opinion, I don't know what they do.

Thanks!

G

Thanks Graham

Here is the current list of Dataset values in PubMed's list:

Allowed Type attribute values for databanks (sample XML is available here):

BioProject Dryad figshare GDB Omim PDB PIR SwissProt UniMES UniParc UniProtKB UniRef NCBI:dbgap NCBI:dbvar NCBI:genbank NCBI:genome NCBI:gensat NCBI:geo NCBI:homologene NCBI:nucleotide NCBI:popset NCBI:protein NCBI:pubchem-bioassay NCBI:pubchem-compound NCBI:pubchem-substance NCBI:refseq NCBI:snp NCBI:sra NCBI:structure NCBI:taxonomy NCBI:unigene NCBI:unists

@Frederick Atherden would you be able to look in our archive to see whether we've used all of these databases in our datasets? It would be good to see what the URLs are, so Graham could potentially extend that logic from what he has if we can.

It looks like Graham is not using the @assigning-authority so we can remove it asap. @Elife Production can you confirm you are OK with this and I'll get the wheels moving on that with Exeter.

Regarding @pub-id-type values I think we can simplify this for our content to "if there is a DOI use that, if not use accession". Some may go slightly wrong, but the level of complexity and effort we're putting into this does not really have any impact so why are we making it so complex?!

Adding: The source for these options could be controlled via Schematron and then a mapping provided to Graham for PubMed conversions. See GitHub ticket: https://github.com/elifesciences/elife-pubmed-feed/issues/78

@Graham Nott PMC just use what we provide, no fancy conversions.

graham commented 4 years ago

I believe you have the wrong graham :)

Melissa37 commented 4 years ago

Sorry! @graham Putting the right name on here now :-)

@gnott

Melissa37 commented 4 years ago

Actions:

[x] Update the PubMed code @gnott dealing with on this ticket: https://github.com/elifesciences/elife-pubmed-feed/issues/78
[ ] Write ticket on Exeter Board asking them to change the UI of Kriya
no longer add assigning authority
[ ] Simplify Gitbook
no longer add assigning authority
remove archive as pub-id-type so only DOI and Accession. If not DOI, use Accession
[ ] Update Schematron to reflect the changes

gnott commented 4 years ago

I'm looking at the PubMed deposit logic more closely now, and I see the assigning-authority value is actually used right now. I can change it so it is not required though. It also only applies to the uri matching method of classifying the dataset, and the proposed logic of using the source value will avoid that too.

gnott commented 4 years ago

So we are not stuck if you remove the assigning-authority attribute, I have changed the code in the PubMed generation library in PR https://github.com/elifesciences/elife-pubmed-xml-generation/pull/39 that only looks at the uri value alone. The tests are passing and I think it will be safe to deploy this until matching the <source> value is added in.

Melissa37 commented 4 years ago

Fantastic, thank you!

@FAtherden-eLife can we implement the rules about source and schematron without removing the assigning authority from Kriya? M

Melissa37 commented 4 years ago

@FAtherden-eLife based on https://github.com/libero/editor/issues/101

When we talk to Exeter about redesigning Dataset references we should have: <element-citation> With attributes: @publication-type="data" @specific-use= “analyzed” OR @specific-use= “generated” <person-group> @person-group-type="author" <name><surname><given-names> AND/OR <collab> <data-title> <year iso-8601-date="XXXX"> <source>Repository Name</source> <pub-id OR <ext-link> @pub-id-type="doi" OR @pub-id-type="accession" <version designator="XXX">

Is this right?

fred-atherden commented 4 years ago

@Melissa37, yup looks good.

@specific-use= “analyzed” OR @specific-use= “generated”

In the Editor ticket we have @specific-use= “supporting” instead of @specific-use= “analyzed”.

I think analyzed is better though (going from the JATS4R recs), so I'll update it there.

Melissa37 commented 4 years ago

Fab, thanks. Sorry that's my fault. If yo agree that I've used the right term we're all good :-)

Melissa37 commented 4 years ago

On Kriya 2 Gitlab board: https://gitlab.com/ExeterPremedia/reqs/cust/elife-kriyadocs/-/issues/20

fred-atherden commented 3 years ago

Not available as a field in Kriya 2. Need to confirm that it's not in the XML.

elifesciences / schematron-wiki

Remove @assiging-authority from Dataset references #123