adyeths / u2o

USFM to OSIS bible format converter.
The Unlicense
19 stars 6 forks source link

"WARNING(NESTING): verse Num.26.5 is not well formed" with list items #119

Closed Screwtapello closed 1 year ago

Screwtapello commented 1 year ago

This is a follow-up to #118.

When I started converting the BSB text from USFM to OSIS, I got a whole bunch of nesting warnings. I assumed there'd only be a couple of root causes, so I reported the first one I found with a minimised test-case. I very much appreciated you solving it so quickly, but when I went back to check my list of nesting errors, they were reduced from 142 warnings to 141.

So I spent some time digging into the remainder of the warnings. I haven't looked into all of them yet, but most of them seem very closely related to the issue in #118. Here's a minimised version of the problem that's common to most of the warnings in Numbers:

Minimal 04NUMBSB.usfm:

\id NUM - Berean Study Bible
\mt1 Numbers
\c 26
\v 5 Reuben was the firstborn of Israel. These were the descendants of Reuben: 
\b
\li1 The Hanochite clan from Hanoch, 
\b
\li1 the Palluite clan from Pallu, 
\b
\li1 
\v 6 the Hezronite clan from Hezron, 

...produces this OSIS output (slightly reformatted for clarity):

<?xml version='1.0' encoding='utf-8'?>
<osis xmlns="http://www.bibletechnologies.net/2003/OSIS/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bibletechnologies.net/2003/OSIS/namespace http://www.bibletechnologies.net/osisCore.2.1.1.xsd">
  <osisText osisIDWork="BSBusfm" osisRefWork="Bible" xml:lang="en">
    <header>
    </header>
    <div type="book" osisID="Num" canonical="true">
        <!-- mt1 -->
        <title level="1" type="main">Numbers</title>
        <chapter sID="Num.26" osisID="Num.26" n="26"/>
            <verse sID="Num.26.5" osisID="Num.26.5" n="5"/>Reuben was the firstborn of Israel. These were the descendants of Reuben:
            <list>
                <item type="x-indent-1">The Hanochite clan from Hanoch,</item>
            </list>
            <list>
                <item type="x-indent-1">the Palluite clan from Pallu, <verse eID="Num.26.5"/></item>
            </list>
            <list>
                <item type="x-indent-1"><verse sID="Num.26.6" osisID="Num.26.6" n="6"/>the Hezronite clan from Hezron, <verse eID="Num.26.6"/></item>
            </list>
        <chapter eID="Num.26"/>
    </div>
  </osisText>
</osis>

...which produces the following warnings:

You are running osis2mod: $Rev: 3769 $ (SWORD: 1.9.0)
WARNING(NESTING): verse Num.26.5 is not well formed:(3,5)
SUCCESS: osis2mod: has finished its work and will now rest

In the previous issue, you implied you'd grabbed a copy of the original USFM files to test them for yourself. If you're interested and have a Unixy OS, I've published a repository of my sources and conversion process to GitLab.

I examined the commit that fixed #118, but I couldn't figure out how to extend those changes to fix this issue too.

I'm also open to the resolution that u2o is already producing the most reasonable OSIS representation of the USFM sources, and it's osis2mod that's being picky. It's not clear to me how serious these warnings are.

Thank you for your time!

adyeths commented 1 year ago

It's unlikely that I will ever be able to produce output that completely eliminates nesting warnings reported by the osis2mod tool. I will reexamine the code, but I'm not confident that I will be able to fix this particular issue.

Screwtapello commented 1 year ago

That's fair. Thanks for looking!

adyeths commented 1 year ago

This doesn't look like something that can be changed to eliminate the warnings from osis2mod.

I wouldn't worry to much about it. osis2mod is a bit picky about things. Unfortunately sometimes the things it's picky about (such as this issue) can't actually be fixed. Such is the nature of trying to map chapter and verse markers onto book/section/paragraph markup like u2o tries to do. So I will just close this issue.