Use a global normalized format that can be converted to any other format

daisy / pipeline-scripts

!! NOTE: This project is now part of the pipeline-modules project !! | Script modules for the default DAISY Pipeline 2 distribution.

GNU Lesser General Public License v3.0

6 stars 5 forks source link

Use a global normalized format that can be converted to any other format #137

Open josteinaj opened 6 years ago

josteinaj commented 6 years ago

Based on @bertfrees's comment: https://github.com/daisy/pipeline-scripts/issues/101#issuecomment-329759018

dp2 transformation chaining

These new scripts could use this method:

EPUB 2 to PEF/BRF script: #136
MS Word to EPUB 3 script: #135
MS Word to DAISY 2.02 script: #134
ODT to EPUB 3 script: #133
ODT to DAISY 2.02 script: #132
Markdown to DTBook script: #131
RTF to DTBook script: #130
MS Word to DTBook script: #129
ODT to PEF script: #127
HTML to DTBook script: #122
ODT to DTBook script: #121

bertfrees commented 6 years ago

For ODT to PEF I think it makes more sense to go directly because all *-to-pef scripts are based on the same "XML/CSS to OBFL to PEF" pipeline.

bertfrees commented 6 years ago

I've extended the matrix view of our scripts (x) with requested scripts (o):

Inputs	Outputs
Inputs	DAISY 2.02	DAISY 3	DTBook	EPUB 3	HTML	ZedAI	PEF	RTF	ODT
DAISY 2.02				x
DAISY 3	x			x
DTBook		x		x	x	x	x	x	x
EPUB 2							o
EPUB 3	x			x			x
HTML			o	x			x
ZedAI				x	x		x
RTF			o
ODT	o		o	o			o
DOCX	o		o	o
Markdown			o

bertfrees commented 6 years ago

A candidate for the central format is something like "NLBPUB", an (exploded) EPUB 3 that contains a single HTML file.

In Leipzig we asked ourselves the question whether it really makes sense for all conversion to have the same intermediary format. Is it easier for some conversions to go directly from input to output, possibly making use of utility steps that are shared between conversions? Should the central format be clearly specified or can the exact interpretation depend on the conversion?

We haven't come up with a clear answer to these questions.

Another important question is whether existing scripts should be refactored to make use of the central format. It is clear that in order to benefit optimally from the central format (to fill the whole matrix), every format needs a conversion to and from the central format.

bertfrees commented 6 years ago

After the discussion in Leipzig I think I'm still in favor of the central format. Initially it will require some more work but on the long term I think we'll be happier. We'll have a truly modular system which will be easier to maintain and easier to extend with new functionality.

A central format and reusable utility steps are not mutually exclusive. We can still make use of common steps in the conversions from and to the central format. Functionality common to all scripts should be implemented as much as possible in steps that process the central format.