This project contains processing steps for generating the more reading-text, less diplomatic text representations of the Faust edition and most other generated or converted data, except for the diplomatic text representations
This is work in progress.
This is mainly used as a submodule to https://gihub.com/faustediion/faust-gen – the easiest way to run this is to checkout that module including all submodules and running mvn -Pxproc
there.
Alternatively, you need:
You should then clone this repository and edit the configuration file, config.xml as you see fit (e.g., enter the path to your copy of the Faust data). You could also leave the config file as it is and pass in the relevant parameters as parameters to the XML processor.
To generate all data, run the pipeline generate-all
, e.g., using
calabash generate-all.xpl
This will run all processing steps and generate the HTML data in subdirectories of target
by default.
Basically, we need to perform three steps, in order:
documents
folderAll steps read config.xml, and all XSLT stylesheets have the parameters defined there available. All parameters from config.xml can also be passed by the usual means of passing parameters to pipelines (like calabash's -p
option).
The output is a list of <textTranscript>
elements, here is an example:
<textTranscript xmlns="http://www.faustedition.net/ns"
uri="faust://xml/transcript/gsa/391083/391083.xml"
href="https://github.com/faustedition/faust-gen-html/blob/master/file:/home/vitt/Faust/transcript/gsa/391083/391083.xml"
document="document/maximen_reflexionen/gsa_391083.xml"
type="archivalDocument"
f:sigil="H P160">
<idno type="bohnenkamp" uri="faust://document/bohnenkamp/H_P160" rank="2">H P160</idno>
<idno type="gsa_2" uri="faust://document/gsa_2/GSA_25/W_1783" rank="28">GSA 25/W 1783</idno>
<idno type="gsa_1"
uri="faust://document/gsa_1/GSA_25/XIX,2,9:2"
rank="50">GSA 25/XIX,2,9:2</idno>
</textTranscript>
href
is the local path to the actual transcript, document
is the relative URL to the metadata document. type
is either archivalDocument
or print
. The <idno>
elements are ordered by an order of preference defined in the pipeline (depending on type) and recorded in the respective rank
attribute.
variants
directory configured in config.xmlThis step performs three substeps that are controlled by additional files:
This removes the genetic markup from the textual transcripts by applying the edits indicated by the markup. Thus, the result represents the last state of the text in the input document.
The document is passed through the following steps:
ge:transpose
del
, corr
etc.), performs character normalizations and a set of other normalizations. This also includes the rules for harmonize-antilabes.xsl, which transforms the antilabe encoding that are in the the join form to the part form so we only have to deal with one form in the further processing.spanTo
etc. Attention, this step will remove text if you include delSpan
elements that point to a non-existing anchor. The script will print a warning if it detects such a case.<p>
-based markup in Trüber Tag. Feld. to a <lg>/<l>
based markup as in the verse parts to ease collation.basename
is the name used for the HTML files, relative to the output directory given by the html
parameter.all.html
file for the all-in-one document are generated inside the folder specified using the html
parameterSteps:
<pb>
elements with a normalized page number used When generating HTML from longer documents, these are split into multiple HTML files along TEI <div>
elements. This can be configured from the configuration file.
To find out which page is where, we generate an index that maps faust:// URIs and pages to HTML file names. This is a two-step process, the print2html.xpl pipeline generates an XML summary outlining files and pages of a single document (see pagemap.xsl for details), pagelist2json.xsl converts the information from all these documents to a single JSON file. You can then generate links in the form filename#dt
pagenumber to link to the individual files.
There is experimental code to generate an Einblendungsapparat as well. This kind of apparatus is based on the first level of the text, not the last, and it signifies later editings in the text in special markup using editorial notes in 〈angled brackets〉. The current implementation is still unfinished and renders only the most frequent editings.
The CSS rules required for the apparatus are currently at the end of lesetext.css. Please again note that this is a moving target.