adyeths / u2o

USFM to OSIS bible format converter.
The Unlicense
19 stars 6 forks source link

Marking references in cross-reference notes #79

Closed DavidHaslam closed 5 years ago

DavidHaslam commented 5 years ago

The SWORD developers' wiki has a section on this.

The examples are quite specific and can be summarised in a few rules.

When the cross-reference note contains more than one reference,

From discussion in issue #78 it would appear that these requirements are ignored by orefs.py.

Providing there is no problem in regard to punctuation conflicts, it's a relatively simple task to split the note into separate reference elements.

DavidHaslam commented 5 years ago

The osisRef attribute values in a module are what makes each reference into a hyperlink that jumps to the reference location when you click on it.

You cannot jump simultaneously to multiple locations.

Thus the following example with 4 references within a single osisRef attribute is a problem:

<reference osisRef="Exod.20.11 Exod.31.17 Deut.5.14 Heb.4.4">Eks 20:11, 31:17; Dute 5:14; Heb 4:4</reference>

What happens when you click on this in a SWORD or JSword front-end app?

DavidHaslam commented 5 years ago

The OSIS 2.1 User Manual is rather too sparse in how it covers notes with a reference element.

The osisRef attribute is used to specify in a machine processable way the target of the reference. A reader can easily use biblical citations written in any number of ways after a moments hesitation. Computers on the other hand will never find the correct material unless assisted by properly written references. OSIS provides one standard way to write such references.

A reference element was used in the note example above. To refresh your memory, here is just the reference element part of that example: <reference osisRef="Ezra.4.6">Ezra 4:6<reference>

Only one example is provided and that merely for a single reference.

Nonetheless, the requirements are clearly stated, albeit not in the section you may have been looking at.

A single osisRef cannot identify a discontiguous range of a work. For example, a complex reference such as "John 3:14-16, 18; 4:1-2; 19-20" cannot be encoded as a single reference. It must instead be encoded as several parts, each contiguous:

See also
<reference osisRef="John.3.14-John.3.16">John 3:14-16, </reference>
<reference osisRef="John.3.18">18; </reference>
<reference osisRef="John.4.1-John.4.2">4:1-2; </reference>
<reference osisRef="John.4.19-John.4.20">4:19-20; </reference>.

It is permissible for osisRef values, including those on either side of a hyphen in a range reference, to use osisID values that include the work-specific extension fields ("!" followed by a name, e.g. osisRef="Ps.3.23!b-Ps.3.24!a")).

I think the case is clear.

DavidHaslam commented 5 years ago

Here's a further observation:

Consider and compare the following two vernacular references:

They are formally similar, but the first can be coded with an osisRef using a range, whereas the second, being for 2 discontiguous verses, cannot.

orefs.py needs to be fully aware of this possibility such that where (e.g.) a comma separates 2 contiguous verses, the osisRef attribute can and should optimise by using the equivalent range.

The second requires two separate reference elements!

DavidHaslam commented 5 years ago

A corollary of the above 'rules' for coding valid osisRef attributes is this:

adyeths commented 5 years ago

orefs does one thing, and one thing only. It adds osisRef attributes to existing references in osis documents. It does not, and was never intended to, create reference tags on its own. That job is exclusively for u2o.

considering this statement

Providing there is no problem in regard to punctuation conflicts, it's a relatively simple task to split the note into separate reference elements.

This is not even close to being true. Parsing references is a difficult task in itself... splitting references apart into multiple tags without causing problems is an even more difficult task. I'm not even going to attempt to do this. It is far too complicated and prone to problems.

The inclusion of multiple references in the osisRef attribute is called grouping. (See the osisID section of the osis manual.) SWORD based software has no problems with this. I use modules that I generate with the output of my scripts on a daily basis in various frontends without any problem whatsoever.

DavidHaslam commented 5 years ago

I too have read the OSIS manual, so please give me credit for how I understand it.

Grouping applies to osisID but it does not apply to osisRef.

Translators often group several adjacent verses into a single block, so that they can translate them using word order more natural in the target language. In such cases, the larger unit (commonly a paragraph or p element), gets an osisID that lists all the individual osisIDs for the verses included, separated by white space. For example: <p osisID="Matt.1.1 Matt.1.2 Matt.1.3">...</p> osisIDs never allow the use of ranges. Only osisRefs (discussed later) do.

There is no example in the OSIS Manual of an osisRef attribute value containing white-space, nor for it to have a reference to other than a single verse or a verse range, though I suppose a reference to a whole chapter would also be valid.

Please cite examples of a SWORD module that allegedly "works" in which there are such multiple osisRef strings for non-contiguous passages. I've been a CrossWire volunteer for 10 years, and I've never seen one. I'm a member of the Modules Team.

And please (in future) refrain from closing an issue before we have reached a common understanding.

LAfricain commented 5 years ago

What happens when you click on this in a SWORD or JSword front-end app?

On Xiphos it works very well. The multiples refs appears in the left panel and you have only to clic to it. I quite agree with this idea to group refs.

For me the greatest problem is to separate the text in a \x usfm tag. I don't found the good one. Or I need to change all the \x tag to \f?

Consider and compare the following two vernacular references: 1Moz.1.3,4 1Moz.1.3,6

For me I consider the first range as an error. It need to be correct to be relevant.

DavidHaslam commented 5 years ago

Neither of these is an error.

The first can be written alternatively as:

There is no alternative way to write the second.

DavidHaslam commented 5 years ago

On Xiphos it works very well. The multiples refs appears in the left panel and you have only to clic to it. I quite agree with this idea to group refs.

No - I think you're confusing what happens for a properly created module with separate reference elements with what you think there is because there are several references in the same note element.

The OSIS manual specifically prohibits having references for multiple non-contiguous passages in the same osisRef attribute.

DavidHaslam commented 5 years ago

@adyeths - please tell me which SWORD module has an example of an osisRef attribute that contains a space as a consequence of having two or more references within the string.

LAfricain commented 5 years ago

The first can be written alternatively as:

For me they is no alternative, it shoul be written with a -.

No - I think you're confusing what happens for a properly created module with separate reference elements with what you think there is because there are several references in the same note element.

I want just to say that it runs! So for me, do not worry about that... even if the manual specify it, there is enough work on the modules ;)

DavidHaslam commented 5 years ago

I have still seen no convincing evidence that what @adyeths has asserted is true,

A module made from OSIS in which are matches to regexp <reference osisRef="\S{6,12}?( \S{6,12}?)+"> and for which all the references work as has been claimed.

I'm not backing down from my position on the basis of mere assertions. Assertions can be mistaken.

I won't take it badly if I'm proved wrong, but if I'm right in how I've read the OSIS manual, then there are serious implications.

adyeths commented 5 years ago

Your assumption that I'm lying has been noted.

I can't name any publicly available modules because I don't know which ones were generated with the output of u2o and orefs. I use modules that I generate with the output of my scripts on a daily basis in various frontends without any problem whatsoever. None of these modules are publicly available.

With regards to orefs, it's development is officially on hold now. I will not be addressing any more problems with it for the forseable future.

DavidHaslam commented 5 years ago

Oh dear! You seem to have taken offence when none was intended.

I merely wish to get to the right technical understanding of what the OSIS Manual really states as the requirements, and humbly sought to obtain evidence to substantiate whichever one of us is not mistaken.

I've been encouraging others to use your scripts, and was pleased when @lafricain had begun to unearth some of the nitty gritty aspects of using orefs.pyas a first time user in the real world of messy source text.

If I am truly mistaken, what prevents you from providing a clear demonstration of such?

Please reconsider. I had our mutual best interests at heart.

David

DavidHaslam commented 5 years ago

@adyeths

Please would you be so kind as to explain why you think I'm mistaken in my understanding of the OSIS Manual.

As I said earlier, I don't mind being wrong, as long as we can come to a common understanding.

To my mind, vague assertions that "there are no problems" simply doesn't take us forward in any meaningful way.

adyeths commented 5 years ago

From my reading of the manual, an osisRef is a type of osisID (and it is specified in the manual that any valid osisID is a valid osisRef). That's why it appears as a subtopic under osisID. And that's why I believe that grouping of the references is allowed and why I have orefs use grouping when processing references.

I did not just implement it without testing it. And if it had not worked I would not have continued working on orefs. I would have abandoned it and moved on long before now.

As for now, until someone with more knowledge of osis speaks up I will continue holding on any further updates to orefs. If it's clarified that I am correct then I will resume trying to improve orefs. If, on the other hand, I am wrong then any work on orefs would be pointless as I will never be able to make it split references apart without encountering major issues. And in that case I will move orefs and it's associated readme to an "unmaintained" folder and cease working on it altogether.

LAfricain commented 5 years ago

If it's clarified that I am correct then I will resume trying to improve orefs.

For me it not very important to clarify it. It works! This is largely sufficient as an argument in "computer" science. It is rather if one day someone goes back a bug really incapacitating, it will be necessary to review the method. For me you can continue without fear to improve orefs. Moreover, I'm not really in favour of unnecessarily overwriting osis files with separate references. I find it counterproductive for the size of the modules. Especially when you see some texts with more than 27000 references!

DavidHaslam commented 5 years ago

@LAfricain

It's not yet apparent what "It works!" actually means when a single reference element contains an osisRef attribute with several separate OSIS references.

Setting aside for the sake of the argument those front-end apps that have a Preview feature for Notes (e.g. Xiphos and PocketSword, etc.), and focusing on front-ends that jump directly to the reference location, it must be obvious to all of us that a jump cannot simultaneously end up at multiple locations.

But what if something happens nonetheless?

Suppose SWORD does jump to the first location found in the osisRef attribute string? Wouldn't you be somewhat tempted to imagine that "everything works OK" unless you already knew that the invisible markup contained further locations?

This is why one should always be cautious about reported results of "testing" unless there is an agreed testing procedure that covers the very kind of case that we are discussing.

cf. My career was mostly as a Test Engineer, so I have a certain type of experience to bring to bear in my thinking about such matters.

We should not let considerations of OSIS file size have any bearing on this discussion.

LAfricain commented 5 years ago

The crossreferences in Bibletime on Android are not displayed. I don't know if it is not implemented or if it is an issue. (For the ndebele and swe1917 modules only).

DavidHaslam commented 5 years ago

Report that elsewhere after testing Bibletime Mini with released modules.

LAfricain commented 5 years ago

No, it is just to add something about test with the crossreferences.