WHO: Production/Production Vendor
WHEN: Between Export from EJP/xPub and delivery to production vendor
WHERE: In eLife bot processes
WHAT: Information regarding assets
WHY: It's a lot of effort for production vendors to get the code right and to understand the requirement and we've had situations where we thought they understood but ignored the requirement and built infrastructure (behind closed source code) based on the wrong understanding. If we could automate the process before it reaches them we can trust the process more and manage changes in the Editorial process when switching out one tool for another.
Also EJP exports some files that should not be used, so we would want to remove them before sending to the production vendor. Example of this is: File type <file file-type='author_pdf_for_review'> This should not be passed on but EJP cannot not export it!
We'd also like to validate EJP output so any tagging expected, which describes the content that drops off, is identified before sending to the production vendor.
Proposed solution
At the end of the submission system xml metadata file there is a section <files>.
Ignore the file types not required for the production process and do not pass on (<file file-type='author_pdf_for_review'>).
Each file output from the system is mentioned in this section and can be matched up via this section of the xml. All files require renaming as per the file naming requirements. See (https://github.com/elifesciences/XML-mapping/blob/master/elife_file_naming_2016_08_25.md)
Main figures:
Example EJP export XML:
<file file-type='figure' id='961146'><upload_file_nm>fig9_projection.pdf</upload_file_nm><order>11</order><size units='bytes'>1373502</size><custom-meta><meta-name>Figure number</meta-name><meta-value>Figure 9</meta-value></custom-meta>
Figures are output with a <file> @file-type attribute ‘figure’ and a file id is given.
The file ID can be ignored completely, that is an internal EJP id for the file on their system.
<upload_file_nm> refers to the current name of the file.
<order> can be ignored too.
<size> can be ignored but might be useful for transfer info?
Within the <custom-meta> the <meta-name> field indicated what the <meta-value> refers to.
Sometimes, authors will upload a file called, for example "Figure 1," but during the editorial process this has become "Figure 3", but they've not re-named the file. They will, however, have updated the <meta-value> to "Figure 3".
What the file actually is, has to be taken from the <meta-value>. However, this is a freeform box so there can be some variation in what is there (ie Fig 1, Figure 1, fig 1, figure 1).
Final XML example:
<fig-group><fig id="fig1" position="float"><label>Figure 1.</label>
...
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="elife00013f001"/></fig></fig-group>
Source code is output with a <file> @file-type attribute ‘aux_file'’
Source data
Example EJP export XML:
<file file-type='data_set' id='61846'><upload_file_nm>26082014RAeLife04525R1_Figure_3_source_data_1.docx</upload_file_nm<order>8</order><size units='bytes'>17908</size><custom-meta><meta-name>Title</meta-name><meta-value>Figure 3-source data 1</meta-value></custom-meta><custom-meta><meta-name>Legend</meta-name><meta-value>Source Data</meta-value></custom-meta></file>
Source data are output with a <file> @file-type attribute ‘data_set'
Supplementary files are output with a <file> @file-type attribute ‘supp'’
The main article file (Word file)
Example EJP export XML:
<file file-type='art_file' id='532004'><upload_file_nm>main_text_revised2_no_markup.docx</upload_file_nm><order>5</order><size units='bytes'>241810</size></file>
Striking image file
Example EJP export XML:
<file file-type='cover_art' id='532008'><upload_file_nm>Striking image.jpg</upload_file_nm><order>13</order><size units='bytes'>169497</size><custom-meta><meta-name>Title</meta-name><meta-value>Neighborhood-dependent amplifications appearing over time</meta-value></custom-meta><custom-meta><meta-name>Legend</meta-name><meta-value>Bacterial colonies with a fluorescent amplification reporter appearing on selective plates over three days (left to right). Bacteria can grow either due to pre-plating single-step mutations that activate a drug resistance gene (dark colonies), or due to amplifications of the gene (bright colonies), which appear over time. Amplifications, as well as some other mutation types, depend on genes in the chromosomal neighborhood of the resistance gene.</meta-value></custom-meta>
A striking image is output with a <file> @file-type attribute ‘cover_art'
Clarification needed and assumptions
We should refer to MECA for the best way to list these files in the Manifest
The eLife Bot processing should remove the ambiguity by renaming the files and providing this new file name and what it is in the manifest file
Unclear how to best provide this new information in the manifest XML yet
Might be good to group the figures and their sub-assets into one fig-group in the bot output XML
Tasks
[ ] Review MECA guidance for how to express this in the manifest file
[ ] Provide LaTex examples in this ticket
[ ] @eLifeProduction to confirm how/when there can be additional main article files sent
[ ] Striking image workflow @eLifeProduction to advise how to handle these files
[ ] @eLifeProduction to confirm whether there are any other potential files output that shoudl be ignored
[ ] @JGilbert-eLife to provide examples of what has dropped off from EJP exports in the past so we can validate the content for these types of things
[ ] @JGilbert-eLife or @Melissa37 to review this ticket content!
Problem / Motivation
WHO: Production/Production Vendor WHEN: Between Export from EJP/xPub and delivery to production vendor WHERE: In eLife bot processes WHAT: Information regarding assets WHY: It's a lot of effort for production vendors to get the code right and to understand the requirement and we've had situations where we thought they understood but ignored the requirement and built infrastructure (behind closed source code) based on the wrong understanding. If we could automate the process before it reaches them we can trust the process more and manage changes in the Editorial process when switching out one tool for another. Also EJP exports some files that should not be used, so we would want to remove them before sending to the production vendor. Example of this is: File type
<file file-type='author_pdf_for_review'>
This should not be passed on but EJP cannot not export it! We'd also like to validate EJP output so any tagging expected, which describes the content that drops off, is identified before sending to the production vendor.Proposed solution
At the end of the submission system xml metadata file there is a section
<files>
.Ignore the file types not required for the production process and do not pass on (
<file file-type='author_pdf_for_review'>
). Each file output from the system is mentioned in this section and can be matched up via this section of the xml. All files require renaming as per the file naming requirements. See (https://github.com/elifesciences/XML-mapping/blob/master/elife_file_naming_2016_08_25.md)Main figures: Example EJP export XML:
<file file-type='figure' id='961146'><upload_file_nm>fig9_projection.pdf</upload_file_nm><order>11</order><size units='bytes'>1373502</size><custom-meta><meta-name>Figure number</meta-name><meta-value>Figure 9</meta-value></custom-meta>
Figures are output with a
<file>
@file-type attribute ‘figure’ and a file id is given. The file ID can be ignored completely, that is an internal EJP id for the file on their system.<upload_file_nm>
refers to the current name of the file.<order>
can be ignored too.<size>
can be ignored but might be useful for transfer info? Within the<custom-meta>
the<meta-name>
field indicated what the<meta-value>
refers to.Sometimes, authors will upload a file called, for example "Figure 1," but during the editorial process this has become "Figure 3", but they've not re-named the file. They will, however, have updated the
<meta-value>
to "Figure 3".What the file actually is, has to be taken from the
<meta-value>
. However, this is a freeform box so there can be some variation in what is there (ie Fig 1, Figure 1, fig 1, figure 1).Final XML example:
<fig-group><fig id="fig1" position="float"><label>Figure 1.</label>
...<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="elife00013f001"/></fig></fig-group>
Figure supplements Example EJP export XML:
<file file-type='additional_figure_data' id='944710'><upload_file_nm>Figure5supplement figure 1.png</upload_file_nm><order>17</order><size units='bytes'>200850</size><custom-meta><meta-name>Figure number</meta-name><meta-value>Figure 5-figure supplement 1</meta-value></custom-meta></file>
Figure supplements are output with a
<file>
@file-type attribute ‘additional_figure_data'’Videos Example EJP export XML:
<file file-type='video' id='532012'><upload_file_nm>Rho_mutations_3D.mov</upload_file_nm><order>4</order><size units='bytes'>3288301</size><custom-meta><meta-name>Title</meta-name><meta-value>Video 1</meta-value></custom-meta></file>
Videos are output with a
<file>
@file-type attribute ‘video'’Reporting standards Example EJP export XML:
<file file-type='graphical_abstract' id='61844'><upload_file_nm>26082014RAeLife04525R1_Reporting_Standards_Document1.pdf</upload_file_nm><order>9</order><size units='bytes'>145174</size><custom-meta><meta-name>Title</meta-name><meta-value>Reporting Standards Document</meta-value></custom-meta></file>
Reporting standards are output with a
<file>
@file-type attribute ‘graphical_abstract'’Source code Example EJP export XML:
<file file-type='aux_file' id='3242'><upload_file_nm>Figure_3_source_code_1.docx</upload_file_nm><order>1</order><size units='bytes'>19456</size><custom-meta><meta-name>Title</meta-name><meta-value>Blah blah blah</meta-value></custom-meta></file>
Source code is output with a
<file>
@file-type attribute ‘aux_file'’Source data Example EJP export XML:
<file file-type='data_set' id='61846'><upload_file_nm>26082014RAeLife04525R1_Figure_3_source_data_1.docx</upload_file_nm<order>8</order><size units='bytes'>17908</size><custom-meta><meta-name>Title</meta-name><meta-value>Figure 3-source data 1</meta-value></custom-meta><custom-meta><meta-name>Legend</meta-name><meta-value>Source Data</meta-value></custom-meta></file>
Source data are output with a
<file>
@file-type attribute ‘data_set'Supplementary files Example EJP export XML:
<file file-type='supp' id='61843'><upload_file_nm>26082014RAeLife04525R1_Supplementary_file_1.docx</upload_file_nm><order>11</order><size units='bytes'>34892</size><custom-meta><meta-name>Title</meta-name><meta-value>Supplementary File 1</meta-value></custom-meta><custom-meta><meta-name>Legend</meta-name><meta-value>Supplementary File</meta-value></custom-meta></file>
Supplementary files are output with a
<file>
@file-type attribute ‘supp'’The main article file (Word file) Example EJP export XML:
<file file-type='art_file' id='532004'><upload_file_nm>main_text_revised2_no_markup.docx</upload_file_nm><order>5</order><size units='bytes'>241810</size></file>
Striking image file Example EJP export XML:
<file file-type='cover_art' id='532008'><upload_file_nm>Striking image.jpg</upload_file_nm><order>13</order><size units='bytes'>169497</size><custom-meta><meta-name>Title</meta-name><meta-value>Neighborhood-dependent amplifications appearing over time</meta-value></custom-meta><custom-meta><meta-name>Legend</meta-name><meta-value>Bacterial colonies with a fluorescent amplification reporter appearing on selective plates over three days (left to right). Bacteria can grow either due to pre-plating single-step mutations that activate a drug resistance gene (dark colonies), or due to amplifications of the gene (bright colonies), which appear over time. Amplifications, as well as some other mutation types, depend on genes in the chromosomal neighborhood of the resistance gene.</meta-value></custom-meta>
A striking image is output with a
<file>
@file-type attribute ‘cover_art'Clarification needed and assumptions
Tasks
Technical notes
@gnott
User interface / Wireframes
NA
@gnott @eLifeProduction