daisy / pipeline-scripts

!! NOTE: This project is now part of the pipeline-modules project !! | Script modules for the default DAISY Pipeline 2 distribution.
GNU Lesser General Public License v3.0
6 stars 5 forks source link

Use a global normalized format that can be converted to any other format #137

Open josteinaj opened 6 years ago

josteinaj commented 6 years ago

Based on @bertfrees's comment: https://github.com/daisy/pipeline-scripts/issues/101#issuecomment-329759018

dp2 transformation chaining

These new scripts could use this method:

bertfrees commented 6 years ago

For ODT to PEF I think it makes more sense to go directly because all *-to-pef scripts are based on the same "XML/CSS to OBFL to PEF" pipeline.

bertfrees commented 6 years ago

I've extended the matrix view of our scripts (x) with requested scripts (o):

Inputs Outputs
DAISY 2.02 DAISY 3 DTBook EPUB 3 HTML ZedAI PEF RTF ODT
DAISY 2.02 x
DAISY 3 x x
DTBook x x x x x x x
EPUB 2 o
EPUB 3 x x x
HTML o x x
ZedAI x x x
RTF o
ODT o o o o
DOCX o o o
Markdown o
bertfrees commented 6 years ago

A candidate for the central format is something like "NLBPUB", an (exploded) EPUB 3 that contains a single HTML file.

In Leipzig we asked ourselves the question whether it really makes sense for all conversion to have the same intermediary format. Is it easier for some conversions to go directly from input to output, possibly making use of utility steps that are shared between conversions? Should the central format be clearly specified or can the exact interpretation depend on the conversion?

We haven't come up with a clear answer to these questions.

Another important question is whether existing scripts should be refactored to make use of the central format. It is clear that in order to benefit optimally from the central format (to fill the whole matrix), every format needs a conversion to and from the central format.

bertfrees commented 6 years ago

After the discussion in Leipzig I think I'm still in favor of the central format. Initially it will require some more work but on the long term I think we'll be happier. We'll have a truly modular system which will be easier to maintain and easier to extend with new functionality.

A central format and reusable utility steps are not mutually exclusive. We can still make use of common steps in the conversions from and to the central format. Functionality common to all scripts should be implemented as much as possible in steps that process the central format.