bundesverfassung-oesterreich / bv-static

Die Entstehung des Bundes-Verfassungsgesetzes 1920
https://b-vg.acdh.oeaw.ac.at/
Other
0 stars 0 forks source link

Check Whitespaces for pagebreaks #142

Closed cfhaak closed 2 months ago

cfhaak commented 3 months ago

https://b-vg.acdh.oeaw.ac.at/bv_doc_id__1.html?correction_toggler=on&comment_toggler=on#heading_article_5

simple example there shouldn't be a whitespace after the pipe

cfhaak commented 3 months ago

And none before! Aspecially when that thing is inside of a word in the case above the html is reeeeaaaly simple and good ;)

<span class="anchor-pb"></span><span class="pb" source="https://viewer.acdh.oeaw.ac.at/viewer/api/v1/records/bv_doc_id__1/files/images/IMG_0003/full/full/0/default.jpg" n="2" style="--page_before: '1'; --beginning_page: '2';"></span><span n="2" class="pb_marker"></span>
                                 <span class="historical_pagecounter">- 2 - </span>

                                 <h3 class="article single first" id="heading_article_5">Art. V. </h3>
span.pb_marker::after {
    content: "|";
    padding: 0 .5rem;
}
cfhaak commented 3 months ago

well now I see that padding …

cfhaak commented 3 months ago

its horrible I don't even get where the spaces stem from, but whatever. I just wrote a set of templates to adress every relevant whitespace & (for testing) remove it. This seems to work

    <!-- templates to handle whitespace around pbs -->
    <!-- this removes whitespace only texnodes direclty following after a pb/fw -->
    <xsl:template
        match="//node()[self::text() and preceding-sibling::*[1][local-name() = 'pb'] and normalize-space() = '']"/>
    <xsl:template
        match="//node()[self::text() and preceding-sibling::*[1][local-name() = 'fw'] and normalize-space() = '']"/>
    <!-- this removes whitespace from the left of texnodes direclty following after fw if they contain non-whitespace -->
    <xsl:template
        match="//node()[self::text() and preceding-sibling::*[1][local-name() = 'fw'] and normalize-space() != '']">
        <xsl:value-of select="
                concat(
                normalize-space(),
                substring-after(., normalize-space())
                )"/>
    </xsl:template>
    <!-- this removes whitespace from the left of texnodes direclty following after a pb if they contain non-whitespace -->
    <xsl:template
        match="//node()[self::text() and preceding-sibling::*[1][local-name() = 'pb'] and normalize-space() != '']">
        <xsl:value-of select="
                concat(
                normalize-space(),
                substring-after(., normalize-space())
                )"/>
    </xsl:template>
    <!-- this removes whitespace from the right of texnodes direclty followed by a pb or fw if they contain non-whitespace -->
    <xsl:template
        match="//node()[self::text() and following-sibling::*[1][local-name() = 'fw' or local-name() = 'pb'] and normalize-space() != '']">
        <xsl:value-of select="
                concat(substring-before(., normalize-space()), normalize-space())"/>
    </xsl:template>
    <xsl:template
        match="//node()[self::text() and following-sibling::*[1][local-name() = 'pb'] and normalize-space() != '']">
        <xsl:variable name="nonspace_text" select="normalize-space()"/>
        <xsl:value-of select="
                concat(substring-before(., normalize-space()), normalize-space())"/>
    </xsl:template>
cfhaak commented 3 months ago

So I have 4 Options:

  1. lstrip
  2. rstrip
  3. strip
  4. pad

I need to find a way that also deals well with the toggleable fw (historical paginations).

cfhaak commented 3 months ago

Normally pbs should be surrounded by 1 white-space on each side, but not if

  1. a pb is the first child-node of a structural parent element (eg. p). Then there should be no white-space on either sides.
  2. its break attributes value is "no". Then there should be no white-space on eithers sides.

To ignore the annoying historical fw-Elements I will just ignore them for now.

cfhaak commented 3 months ago

One approach would of cause be, to strip everything an just add the needed white-spaces. There are a lot if inconsistencies and edge cases. Eg. inline Elements at the end of paragraphs followed by pb …

cfhaak commented 2 months ago

ok I thin I solved it, there are now some cases left where the rule is ambiguous, need to investigate

cfhaak commented 2 months ago

Also closed by deleting all whitespaces and adding them before pb break no