elifesciences / package-ejp-raw-output-zip

Transform article raw zip files from EJP to a more consistent output.
MIT License
0 stars 1 forks source link

EJP Manifest output: convert asset file names and remove those not used #11

Open Melissa37 opened 5 years ago

Melissa37 commented 5 years ago

Problem / Motivation

WHO: Production/Production Vendor WHEN: Between Export from EJP/xPub and delivery to production vendor WHERE: In eLife bot processes WHAT: Information regarding assets WHY: It's a lot of effort for production vendors to get the code right and to understand the requirement and we've had situations where we thought they understood but ignored the requirement and built infrastructure (behind closed source code) based on the wrong understanding. If we could automate the process before it reaches them we can trust the process more and manage changes in the Editorial process when switching out one tool for another. Also EJP exports some files that should not be used, so we would want to remove them before sending to the production vendor. Example of this is: File type <file file-type='author_pdf_for_review'> This should not be passed on but EJP cannot not export it! We'd also like to validate EJP output so any tagging expected, which describes the content that drops off, is identified before sending to the production vendor.

Proposed solution

At the end of the submission system xml metadata file there is a section <files>.
Ignore the file types not required for the production process and do not pass on (<file file-type='author_pdf_for_review'>). Each file output from the system is mentioned in this section and can be matched up via this section of the xml. All files require renaming as per the file naming requirements. See (https://github.com/elifesciences/XML-mapping/blob/master/elife_file_naming_2016_08_25.md)

Main figures: Example EJP export XML: <file file-type='figure' id='961146'><upload_file_nm>fig9_projection.pdf</upload_file_nm><order>11</order><size units='bytes'>1373502</size><custom-meta><meta-name>Figure number</meta-name><meta-value>Figure 9</meta-value></custom-meta>

Figures are output with a <file> @file-type attribute ‘figure’ and a file id is given. The file ID can be ignored completely, that is an internal EJP id for the file on their system. <upload_file_nm> refers to the current name of the file. <order> can be ignored too. <size> can be ignored but might be useful for transfer info? Within the <custom-meta> the <meta-name> field indicated what the <meta-value> refers to.

Sometimes, authors will upload a file called, for example "Figure 1," but during the editorial process this has become "Figure 3", but they've not re-named the file. They will, however, have updated the <meta-value> to "Figure 3".

What the file actually is, has to be taken from the <meta-value>. However, this is a freeform box so there can be some variation in what is there (ie Fig 1, Figure 1, fig 1, figure 1).

Final XML example: <fig-group><fig id="fig1" position="float"><label>Figure 1.</label> ... <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="elife00013f001"/></fig></fig-group>

Figure supplements Example EJP export XML: <file file-type='additional_figure_data' id='944710'><upload_file_nm>Figure5supplement figure 1.png</upload_file_nm><order>17</order><size units='bytes'>200850</size><custom-meta><meta-name>Figure number</meta-name><meta-value>Figure 5-figure supplement 1</meta-value></custom-meta></file>

Figure supplements are output with a <file> @file-type attribute ‘additional_figure_data'’

Videos Example EJP export XML: <file file-type='video' id='532012'><upload_file_nm>Rho_mutations_3D.mov</upload_file_nm><order>4</order><size units='bytes'>3288301</size><custom-meta><meta-name>Title</meta-name><meta-value>Video 1</meta-value></custom-meta></file>

Videos are output with a <file> @file-type attribute ‘video'’

Reporting standards Example EJP export XML: <file file-type='graphical_abstract' id='61844'><upload_file_nm>26082014RAeLife04525R1_Reporting_Standards_Document1.pdf</upload_file_nm><order>9</order><size units='bytes'>145174</size><custom-meta><meta-name>Title</meta-name><meta-value>Reporting Standards Document</meta-value></custom-meta></file>

Reporting standards are output with a <file> @file-type attribute ‘graphical_abstract'’

Source code Example EJP export XML: <file file-type='aux_file' id='3242'><upload_file_nm>Figure_3_source_code_1.docx</upload_file_nm><order>1</order><size units='bytes'>19456</size><custom-meta><meta-name>Title</meta-name><meta-value>Blah blah blah</meta-value></custom-meta></file>

Source code is output with a <file> @file-type attribute ‘aux_file'’

Source data Example EJP export XML: <file file-type='data_set' id='61846'><upload_file_nm>26082014RAeLife04525R1_Figure_3_source_data_1.docx</upload_file_nm<order>8</order><size units='bytes'>17908</size><custom-meta><meta-name>Title</meta-name><meta-value>Figure 3-source data 1</meta-value></custom-meta><custom-meta><meta-name>Legend</meta-name><meta-value>Source Data</meta-value></custom-meta></file>

Source data are output with a <file> @file-type attribute ‘data_set'

Supplementary files Example EJP export XML: <file file-type='supp' id='61843'><upload_file_nm>26082014RAeLife04525R1_Supplementary_file_1.docx</upload_file_nm><order>11</order><size units='bytes'>34892</size><custom-meta><meta-name>Title</meta-name><meta-value>Supplementary File 1</meta-value></custom-meta><custom-meta><meta-name>Legend</meta-name><meta-value>Supplementary File</meta-value></custom-meta></file>

Supplementary files are output with a <file> @file-type attribute ‘supp'’

The main article file (Word file) Example EJP export XML: <file file-type='art_file' id='532004'><upload_file_nm>main_text_revised2_no_markup.docx</upload_file_nm><order>5</order><size units='bytes'>241810</size></file>

Striking image file Example EJP export XML: <file file-type='cover_art' id='532008'><upload_file_nm>Striking image.jpg</upload_file_nm><order>13</order><size units='bytes'>169497</size><custom-meta><meta-name>Title</meta-name><meta-value>Neighborhood-dependent amplifications appearing over time</meta-value></custom-meta><custom-meta><meta-name>Legend</meta-name><meta-value>Bacterial colonies with a fluorescent amplification reporter appearing on selective plates over three days (left to right). Bacteria can grow either due to pre-plating single-step mutations that activate a drug resistance gene (dark colonies), or due to amplifications of the gene (bright colonies), which appear over time. Amplifications, as well as some other mutation types, depend on genes in the chromosomal neighborhood of the resistance gene.</meta-value></custom-meta>

A striking image is output with a <file> @file-type attribute ‘cover_art'

Clarification needed and assumptions


Tasks

Technical notes

@gnott

User interface / Wireframes

NA

@gnott @eLifeProduction

Melissa37 commented 5 years ago

See kitchen sink example for better clarity of required tagging. This will need to be documented clearly though.