elifesciences / package-ejp-raw-output-zip

Transform article raw zip files from EJP to a more consistent output.
MIT License
0 stars 1 forks source link

EJP XML manifest items to convert: Data Statement #23

Open Melissa37 opened 5 years ago

Melissa37 commented 5 years ago

Problem / Motivation

WHO: Production/Production Vendor WHEN: Between Export from EJP/xPub and delivery to production vendor WHERE: In eLife bot processes WHAT: Data Statement WHY: We'd be giving the vendor clean metadata with fewer conversion requirements and it gets us closer to the end goal of converting author word files to JATS XML for publication

Proposed solution

Sample EJP output: <custom-meta-group><custom-meta><meta-name>Data Statement</meta-name><meta-name>Data Availability</meta-name><meta-value>A published ChIP dataset was used in this study: Webber JL, Zhang J, Cote L, Vivekanand P, Ni X, Zhou J, N&#x00E8;gre N, Carthew RW, White KP, Rebay I. Genetics. 2013 .The relationship between long-range chromatin occupancy and polymerization of the Drosophila ETS family transcriptional repressor Yan. Raw data for this published study are available as a GEO dataset (Series: GSE34038 and GSE34040). All other data analysed during this study are included in the manuscript and supporting files. Source data files have been provided for Figures 1 and 3.</meta-value><meta-name>The following previously published dataset/s was/were used</meta-name><meta-value>Webber JL, Zhang J, Cote L, Vivekanand P, Ni X, Zhou J, N&#x00E8;gre N, Carthew RW, White KP, Rebay I.</meta-value><meta-value>2013</meta-value><meta-value>The relationship between long-range chromatin occupancy and polymerization of the Drosophila ETS family transcriptional repressor Yan</meta-value><meta-value>https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE34038</meta-value><meta-value>NCBI Gene Expression Omnibus, GSE34038</meta-value><meta-value>Webber JL, Zhang J, Cote L, Vivekanand P, Ni X, Zhou J, N&#x00E8;gre N, Carthew RW, White KP, Rebay I.</meta-value><meta-value>2013</meta-value><meta-value>The relationship between long-range chromatin occupancy and polymerization of the Drosophila ETS family transcriptional repressor Yan</meta-value><meta-value>https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE34040</meta-value><meta-value>NCBI Gene Expression Omnibus, GSE34040</meta-value></custom-meta><custom-meta><meta-name>Corresponding author for proofs</meta-name><meta-value>author-35852</meta-value></custom-meta>

NOTE - sample final XML element citation content does not match that of EJP output. Please follow the principle, not the actual content!

Sample final XML required: <sec sec-type="data-availability" id="s7"><title>Data availability</title><p>A published ChIP dataset was used in this study: Webber JL, Zhang J, Cote L, Vivekanand P, Ni X, Zhou J, N&#x00E8;gre N, Carthew RW, White KP, Rebay I. Genetics. 2013 .The relationship between long-range chromatin occupancy and polymerization of the Drosophila ETS family transcriptional repressor Yan. Raw data for this published study are available as a GEO dataset (Series: GSE34038 and GSE34040). All other data analysed during this study are included in the manuscript and supporting files. Source data files have been provided for Figures 1 and 3.</p><p>The following dataset was generated:</p><p><element-citation publication-type="data" specific-use="isSupplementedBy" id="dataset1"><person-group person-group-type="author"><name><surname>Düsterwald</surname><given-names>KM</given-names></name><name><surname>Currin</surname><given-names>CB</given-names></name><name><surname>Burman</surname><given-names>RJ</given-names></name><name><surname>Akerman</surname><given-names>CJ</given-names></name><name><surname>Kay</surname><given-names>AR</given-names></name><name><surname>Raimondo</surname><given-names>JV</given-names></name></person-group><year iso-8601-date="2018">2018</year><data-title>Data from: Biophysical models reveal the relative importance of transporter proteins and impermeant anions in chloride homeostasis</data-title><source>Dryad Digital Repository</source><pub-id assigning-authority="Dryad" pub-id-type="doi">10.5061/dryad.kj1f3v4</pub-id></element-citation></p><p>The following previously published datasets were used:</p><p><element-citation publication-type="data" specific-use="references" id="dataset2"><person-group person-group-type="author"><name><surname>Rau</surname><given-names>CD</given-names></name><name><surname>Wang</surname><given-names>J</given-names></name><name><surname>Wang</surname><given-names>Y</given-names></name><name><surname>Lusis</surname><given-names>AJ</given-names></name></person-group><year iso-8601-date="2013">2013</year><data-title>Transcriptomes of the hybrid mouse diversity panel subjected to Isoproterenol challenge</data-title><source>NCBI Gene Expression Omnibus</source><pub-id assigning-authority="NCBI" pub-id-type="accession" xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48760">GSE48760</pub-id></element-citation></p><p><element-citation publication-type="data" specific-use="references" id="dataset3"><person-group person-group-type="author"><name><surname>Garcia</surname><given-names>Miguel A</given-names></name></person-group><year iso-8601-date="2018">2018</year><data-title>Shear Manuscript</data-title><source>Open Science Framework</source><pub-id assigning-authority="other" pub-id-type="archive" xlink:href="https://osf.io/kvu5j/">kvu5j</pub-id></element-citation></p></sec>

Differences: Convert from a custom meta group to proper tagging Convert <meta-value> associated with <meta-name>Data Availability</meta-name> to <sec sec-type="data-availability" id="s7"><title>Data availability</title><p>XXX</p>

Convert <meta-name>The following dataset/s was/were generated</meta-name> to <p>The following dataset was generated:</p> if one OR <p>The following datasets were generated:</p> if 2 or more

Convert <meta-name>The following previously published dataset/s was/were used</meta-name> to <p>The following previously published dataset was used:</p> if one OR <p>The following previously published datasets were used:</p> if 2 or more

Convert <meta-value> in sets of 5, eg: <meta-value>Webber JL, Zhang J, Cote L, Vivekanand P, Ni X, Zhou J, N&#x00E8;gre N, Carthew RW, White KP, Rebay I.</meta-value><meta-value>2013</meta-value><meta-value>The relationship between long-range chromatin occupancy and polymerization of the Drosophila ETS family transcriptional repressor Yan</meta-value><meta-value>https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE34038</meta-value><meta-value>NCBI Gene Expression Omnibus, GSE34038</meta-value>

Each dataset entry will have 5 <meta-value> fields.

For datasets generated add @specific-use="isSupplementedBy"; for those previously published add @specific-use="references"

dataset IDs to be added to element citation (order cited in) "dataset1", "dataset2" etc

Add to <sec> tagging

Clarification needed and assumptions


Tasks

@gnott

Technical notes

@gnott

User interface / Wireframes

NA but required for Texture

gnott commented 5 years ago

Is the set of five <meta-value> tags always listed in the same order for each dataset, do you know, or is it possible they might be listed in a different order?

Melissa37 commented 5 years ago

Is the set of five tags always listed in the same order for each dataset, do you know, or is it possible they might be listed in a different order?

They are always listed in the same order. But a question I have would be what happens if one is not filled out, does EJP omit it in the export or send it but blank? The first option could cause a problem! I think an assumption is that every field will be filled out in the editorial process and so this would not happen. But we should test this assumption.

@JGilbert-eLife do you know the answer to this?

gnott commented 5 years ago

I'm looking more closely at a potential sample data file, as well as the XML above, and I'm seeing a set of five <meta-value> tags per dataset and not four <meta-value> tags. I am going to edit the above comments to change 4 / four to read 5 / five so it is more precise in this way.

Melissa37 commented 5 years ago

Thanks @gnott you are right, sorry! EJP outputs: Author list: <meta-value>Webber JL, Zhang J, Cote L, Vivekanand P, Ni X, Zhou J, N&#x00E8;gre N, Carthew RW, White KP, Rebay I.</meta-value> Date: <meta-value>2013</meta-value> Title: <meta-value>The relationship between long-range chromatin occupancy and polymerization of the Drosophila ETS family transcriptional repressor Yan</meta-value> URL: <meta-value>https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE34038</meta-value> Database and Identifier: <meta-value>NCBI Gene Expression Omnibus, GSE34038</meta-value>

The final output is split out to generate a reference in the final XML.