adyeths / u2o

USFM to OSIS bible format converter.
The Unlicense
19 stars 6 forks source link

Processing words with more than one Strong's number in the word level attribute #55

Closed DavidHaslam closed 6 years ago

DavidHaslam commented 6 years ago

Currently this is what happens to a word with (e.g.) two Strong's numbers:

USFM input file: \w vida|strong="H5315, H2416"\w*

OSIS output file: <w lemma="strong:H5315, H2416">vida</w>

This is what the OSIS should be: <w lemma="strong:H5315 strong:H2416">vida</w>

NB. Assume that (in the USFM) any space after the comma delimiter is optional.

Background

The above example is from Gen.1.30 in a Spanish Bible text that I'm currently working on. The source text has many instances of words with more than one Strong's number. Some words even have five Strong's numbers!

For testing purposes, attached is the small USFM file for the book of Obadiah, which also has one similar instance in verse 7.

\w aliados|strong="H0582, H1285"\w*

NB. File extension .txt provisionally added just so that I could drop it in GitHub.

31_RVR09 (OT&NT).vpl.usfm.txt

DavidHaslam commented 6 years ago

Aside: The incorrect attribute layout does not cause any OSIS validation error. However, the SWORD API expects to see each number properly prefixed with the word strong and for there to be no comma.

Also, the OSIS header should include the following element when any of the USFM files have Strong's numbers:

  <work osisWork="strong">
    <refSystem>Dict.Strongs</refSystem>
  </work>

It would be good if this were to be added automatically upon detecting such an instance.

adyeths commented 6 years ago

Just added a fix for the strongs numbering problem. Note that I haven't added the header fix yet.

DavidHaslam commented 6 years ago

Thanks for speedy response, as always.

DavidHaslam commented 6 years ago

I have now tested the fix (so far) on a large OSIS file and I am pleased to report that it works OK.

adyeths commented 6 years ago

Header fix has now been added.

DavidHaslam commented 6 years ago

Thanks!