adyeths / u2o

USFM to OSIS bible format converter.
The Unlicense
19 stars 6 forks source link

Nested crossreferences are not well processed #125

Open LAfricain opened 3 months ago

LAfricain commented 3 months ago

For usfm:

\v 1 Au commencement Dieu\f + \fq Dieu\ft , hébr. \qt Elohim\ft . C’est le nom commun de la divinité, susceptible d’être employé à propos des faux dieux, comme à propos du vrai Dieu. Ce nom divin domine dans toute une série de textes de la \qt Genèse\ft .\f* créa\f + \fq Créa\ft , hébr. \qt bârâ’\ft , \+xt Ps. li, 12|PSA 51:12\+xt* ; \+xt lxxxix, 13, 48|PSA 89:13-48\+xt* ; \+xt civ, 30|PSA 104:30\+xt* ; \+xt Is. iv, 5|ISA 4:5\+xt* ; \+xt Amos, iv, 13|AMO 4:13\+xt*.\f* le ciel et la terre\f + \fq Le ciel et la terre \ft : hébraïsme, pour \qt l’univers\ft .\f*.

u2o.py does:

<verse sID="Gen.1.1" osisID="Gen.1.1" n="1" />Au commencement Dieu<note placement="foot"><catchWord>Dieu</catchWord>, hébr. \qt Elohim. C’est le nom commun de la divinité, susceptible d’être employé à propos des faux dieux, comme à propos du vrai Dieu. Ce nom divin domine dans toute une série de textes de la \qt Genèse.</note> créa<note placement="foot"><catchWord>Créa</catchWord>, hébr. \qt bârâ’, <seg type="x-nested"><reference>Ps. li, 12|PSA 51:12</reference></seg> ; <seg type="x-nested"><reference>lxxxix, 13, 48|PSA 89:13-48</reference></seg> ; <seg type="x-nested"><reference>civ, 30|PSA 104:30</reference></seg> ; <seg type="x-nested"><reference>Is. iv, 5|ISA 4:5</reference></seg> ; <seg type="x-nested"><reference>Amos, iv, 13|AMO 4:13</reference></seg>.</note> le ciel et la terre<note placement="foot"><catchWord>Le ciel et la terre </catchWord>: hébraïsme, pour \qt l’univers.</note>.

If the crossreferences are not nested (+x) we have this kind of result result: <reference><!-- USFM Attributes: ([1234A-Z]*) ([0-9]*):([0-9]*) --> The same result would be expected with the nested crossreferences, like that:

<seg type="x-nested"><reference><!-- USFM Attributes: PSA 51:12 -->Ps. li, 12</reference></seg>

And not:

<seg type="x-nested"><reference>Ps. li, 12|PSA 51:12</reference>
UnasZole commented 3 months ago

For reference, the USFM official documentation is not very clear on the topic, but it was confirmed there that nested xt within footnotes are valid : https://github.com/ubsicap/usfm/issues/101#issuecomment-641419522

adyeths commented 3 months ago

This appears to be an issue with the handling of USFM attributes, since the cross reference marker itself is being processed. I will have to investigate the attribute issue futher. Not sure why that's happening. I see that there is an issue with the \qt tag not being processed also. Which is likely because of a lack of the presence of the closing \qt* markers.

LAfricain commented 3 months ago

\qt tag not being processed also.

Yes ;)