Closed LAfricain closed 6 years ago
The conversions of \f + \fr Est.1,1\f*
should produce <reference type="annotateRef">Est.1,1</reference>
and the conversion of \f + \fr Est.1.1\f*
should produce <reference type="annotateRef">Est.1.1</reference>
by my u2o script. It looks like that's exactly what's happening.
orefs should be able to add the appropriate osisRef attributes to those tags afterwards. That was why I wrote it. None of the references created by u2o produce actual reference links. Additional processing is needed to fix referencs.
This <reference osisRef="Est.1.1"</reference>
is definitely not correct.
Ok, Thank you for this information. If I understand well after using u2o, I need to run on the osis file the orefs script, for linking all the crossrefence? Is it?
I tried orefs, this is the result:
<note placement="foot">
<reference type="annotateRef" osisRef="">Est.1,1<!-- orefs - unprocessed reference --></reference>
</note>
<verse eID="Esth.11.1-12"/>
<chapter eID="Esth.11"/>
</p>
<p>
<chapter sID="Esth.12" osisID="Esth.12" n="12"/>
<verse sID="Esth.12.1-5" osisID="Esth.12.1 Esth.12.2 Esth.12.3 Esth.12.4 Esth.12.5" n="1-5"/>
<note placement="foot">
<reference type="annotateRef" osisRef="">Est.1.1<!-- orefs - unprocessed reference --></reference>
For the command: ./orefs.py -v -i ../osis/lxx.osis.xml -o ../osis/lxx.osis_ref.xml
And the ufsm:
\v 1-12 \f + \fr Est.1,1\f*
\c 12
\p
\v 1-5 \f + \fr Est.1.1\f*
\c 13
\p
\v 1-7 \f + \fr Est 3.13\f*
\v 8-18 \f + \fr Esth.4.17\f*
We are near of the goal.
You are correct. First you run u2o to create an osis file. Then you run orefs to process the osis file and add proper osisRef attributes.
In order to process the references above you will need to use a config file. The readme for the orefs utility tries to explain this. It can automatically make one for you that you can then edit as needed too. So you don't have to manually create it.
Ok I generated the CONFIGFILE. I saved it in the same folder of oref.py (with the usfm). I have just ref for Esther, but the result is the same. I the name CONFIGFILE correct? Or I need to add in the oref.py script?
The config file can be named anything you like. You just tell orefs the name of the config file to use. If you named it CONFIGFILE
then you would tell orefs to use it something like this:
orefs.py -i inputfile.osis -o outputfile.osis -c CONFIGFILE
this way it will use CONFIGFILE (or whatever you choose to name it instead) for processing the references instead of trying to do it automatically using the default settings.
Ok it works perfectly!
Hello, I did new test on konvb (usfm for the kikongo). Now it is the marker \r that is converted to osis. Sometime the marker is follow with (, or text is in it, by instance: "Sea also the reference...". I have this error:
cyrille@W54:~/Documents/gitlab/konvb$ orefs.py -v -i osis/konvb.osis.xml -o ../osis/konvb_ref.osis.xml -c osis/CONFIGKONVB
Reading input file osis/konvb.osis.xml ...
Getting book names and abbreviations...
Using config file for abbreviations...
Processing cross references...
WARNING: Reference not processed… Luke 23-38)
WARNING: Reference not processed… John 19-23)
WARNING: Reference not processed… Luke 7-9)
WARNING: Reference not processed… John 24-28)
WARNING: Reference not processed… John 29-34)
WARNING: Reference not processed… Luke 1-13)
WARNING: Reference not processed… Luke 14-15)
WARNING: Reference not processed… Luke 1-11)
Traceback (most recent call last):
File "/home/cyrille/.bin/orefs.py", line 237, in vrschk
rval = str(int(num))
ValueError: invalid literal for int() with base 10: ''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/cyrille/.bin/orefs.py", line 517, in <module>
main()
File "/home/cyrille/.bin/orefs.py", line 513, in main
processfile(args)
File "/home/cyrille/.bin/orefs.py", line 443, in processfile
text = processreferences(text, bookabbrevs, bookabbrevs2)
File "/home/cyrille/.bin/orefs.py", line 208, in processreferences
lines[i] = reftag.sub(simplerepl, lines[i], 0)
File "/home/cyrille/.bin/orefs.py", line 180, in simplerepl
osisrefs, oreferror = getosisrefs(text, currentbook, abbr, abbr2)
File "/home/cyrille/.bin/orefs.py", line 405, in getosisrefs
tmp = vrschk(j)
File "/home/cyrille/.bin/orefs.py", line 240, in vrschk
if num[-1] in "ABCDabcd":
IndexError: string index out of range
If you need files for tests, you can find it in the gitlab repo: here The good config file (for the ref) can be found: here
Thanks for pointing me to the files that are causing the problems. That will be very helpful in fixing the issues with processing references.
The bug that caused orefs to crash is now corrected. The parenthesis that may surround references is now handled and no longer causing problems.
Text preceding a reference is ignored already when processing references with orefs. Text following a reference may not be. I'm unsure if I will ever be able to handle that particular situation. Which is one of the reasons I made sure to have orefs mark where references were not processed so that manual fixes can be made afterwards.
It runs now well, few errors can be still noticed, by instance this reference are not completly converted:
\r (Mt 24, 42 ; 25, 13-15 ; Lk 12, 36-38 ; 19, 12-13)
it gives thisreference type="parallel" osisRef="Luke.12.36-Luke.12.38">(Mt 24,42 ; 25, 13-15 ; Lk 12, 36-38 ; 19, 12-13)<!-- orefs - unprocessed reference --></reference>
And other: <reference type="parallel" osisRef="">(Mk 1, 39 ; Lk 4,44 ye 6, 17-18)<!-- orefs - unprocessed reference --></reference>
And: <reference type="parallel" osisRef="Luke.14.34-Luke.14.35">(Mk 9, 50 ; 4, 21 ; Lk 14, 34-35 ; 8, 16 ; 11, 33)<!-- orefs - unprocessed reference --></reference>
Yes. I'm not sure why it's not handling many of those. I'm still investigating.
Some of the references were not being processed because of whitespace that orefs was not properly handling. I have fixed that issue.
Most of the other references that aren't being processed are because of abbreviations that are not in the config file. (Mc, Lc, and Jn for example.) Add the additional abbreviations where they are needed in the config file and most of the unprocessed references will be handled.
Most of the other references that aren't being processed are because of abbreviations that are not in the config file. (Mc, Lc, and Jn for example.)
Yes this is my errors, I corrected it. It was abbr. in French I change it to Kikongo.
Some issues remain again, like that:
<reference type="parallel" osisRef="Mark.9.50 Luke.14.34-Luke.14.35">(Mk 9, 50 ; 4, 21 ; Lk 14, 34-35 ; 8, 16 ; 11, 33)
I don't now if it can help you but in French, (and also in Kikongo because it is in a French language country) we use the non-break space before a ";".
They are also some errors with the word "ye" (that's mean "and"), what to do with this? Just manually?
See the example: <reference type="parallel" osisRef="Mark.6.7-Mark.6.11">(Mk 6,7-11 ; Lk 9,2-5 ye 10,3-12)
Thank you already for performing orefs.py!
I noticed still this error:
<reference type="parallel" osisRef="Matt.5.15 Mark.10.26 Luke.8.16-Luke.8.17 Mark.11.33">(Mt 5,15 ; 10,26 ; Lk 8,16-17 ; 11,33)
The Mark.10.26
should be Matt.10.26
because it is following Matt.5.15
. The same for Mark.11.33
, should be Luke.11.33
Regarding the references such as those that have ye
, those would have to be manually fixed. I'm not going to be able to have orefs be generic and still be able to handle those situations.
Regarding the other error, part of the problem with that is in the way orefs handles multiple references as well as references within books when the book is not specified. It will likely be a difficult task making orefs handle this particular situation.
When I wrote orefs, I tried to make it handle situations where multiple verses and verse ranges could be specified. For this I use the SEPP separator rather than the SEPM separator. SEPM allows for multiple different references to be specified, but the book always has to be specified or it will default to whatever current book is being processed. (IN the case above, the book is Mark. Which is why it says mark in the osisRef.) Whereas, the SEPP separator allows multiple verses and verse ranges to be specified without having to repeat the book name again.
To illustrate... the reference being processed currently says this:
(Mt 5,15 ; 10,26 ; Lk 8,16-17 ; 11,33)
Since the SEPP character in your config file is .
... if the above were changed to this:
(Mt 5,15 . 10,26 ; Lk 8,16-17 . 11,33)
then it would be processed correctly by orefs. I hope the explanation makes sense. Manual fixes will have to be done if an appropriate character for SEPP can't be used here... at least for now until I can figure out a better way to do things.
OK your explanations are relevant, I change manually the usfm file. Thank you very much!
The conversions of :
\f + \fr Est.1,1\f*
or\f + \fr Est.1.1\f*
don't give a true link in the osis ref:<reference type="annotateRef">Est.1.1</reference>
should it be :<reference osisRef="Est.1.1"</reference>
?