funderburkjim / boesp-prep

Prepare Boehtlingk, Indische Sprüche,
MIT License
0 stars 1 forks source link

<S> tag improvement #42

Open funderburkjim opened 2 years ago

funderburkjim commented 2 years ago

Currently, the D tag uses an attribute 'n', so <D n="67" a="3384"> identifes that the translation following is for saying 67.

The 'a' attribute is (I think!) this saying is numbered as 3384 in the first edition of Indische Spruch. Recall our document is baed on the 2nd edition of I.S.

Also, in the 151 entries that contain multiple sayings, there are multiple translation D elements, e.g.

<D n="67" a="3384">
<D n="68" a="3385">
<D n="69" a="3386">

but the sayings themselves are not similarly identified separately, e.g.

<info L="67,68,69" page="1.13" gtypes="S,D,D,D,F"/>
<S>
agnistejo mahalloke gUDhastiSThati dAruSu |
na copayuGkte taddAru yAvannoddIpyate paraiH ||
sa eva khalu dArubhyo yadA nirmathya dIpyate |
taddAru ca vanaM cAnyannirdahatyAzu tejasA ||
evameva kule jAtAH pAvakopamatejasaH |
kSamAvanto nirAkArAH kASThe 'gniriva zerate ||
</S>

By S-tag improvement we would rewrite this example as

<S n="67">
agnistejo mahalloke gUDhastiSThati dAruSu |
na copayuGkte taddAru yAvannoddIpyate paraiH ||
</S>
<S n="68">
sa eva khalu dArubhyo yadA nirmathya dIpyate |
taddAru ca vanaM cAnyannirdahatyAzu tejasA ||
</S>
<S n="69">
evameva kule jAtAH pAvakopamatejasaH |
kSamAvanto nirAkArAH kASThe 'gniriva zerate ||
</S>

Of course, in entries with only 1 saying, the only change is the added attribute. For instance,

OLD:
<entry>
<info L="1" page="1.1" gtypes="S,D,F,V1"/>
<S>
aMzo 'pi duSTadiSTAnAM pareSAM syAdvinAzakRt |
vAlalezo 'pi vyAghrANAM patsyAjjIvitahAnaye ||
</S>
...
NEW:
<entry>
<info L="1" page="1.1" gtypes="S,D,F,V1"/>
<S n="1">
aMzo 'pi duSTadiSTAnAM pareSAM syAdvinAzakRt |
vAlalezo 'pi vyAghrANAM patsyAjjIvitahAnaye ||
</S>
...
funderburkjim commented 2 years ago

Note that for the multiple saying cases, 'gtypes' is also modified e.g., in the 67,68,69 example:

OLD:
<info L="67,68,69" page="1.13" gtypes="S,D,D,D,F"/>
NEW:
<info L="67,68,69" page="1.13" gtypes="S,S,S,D,D,D,F"/>
funderburkjim commented 2 years ago

The dtd will also need adjustment

OLD: (only 1 S)
<!ELEMENT entry ((info,HS+) | (info,S,D+,F,V1*,V2*,V3*,V4*,V5*))>
NEW: (1 or more S)
<!ELEMENT entry ((info,HS+) | (info,S+,D+,F,V1*,V2*,V3*,V4*,V5*))>

This DTD would validate different numbers of S and D (e.g. 'S,S,D,D,D'), but this currently does not occur -- i.e., currently there are the same number of 'S' elements as 'D' elements in each entry.

funderburkjim commented 2 years ago

D element required

The DTD also says that there must be at least 1 S, at least 1 D, and at least 1 F element in each entry.
For the 'extra' sayings (7614-7865), there actually is no German (or Greek) translation given, we have included a 'minimal' D element, even though it is absent in the print: e.g.,

<entry>
<info L="7865" page="5.249" gtypes="S,D,F"/>
<S>
grAsodgalitasikathasya kariNaH kiM gataM bhavet |
pipIlikA tu tenaiva sakuTumbopajIvati || 7865 ||
</S>
<D n="7865">
<b>7865.</b> 
</D>
<F n="7865">
7865) PRASAN3GAR.
</F>
</entry>
funderburkjim commented 2 years ago

The above commit (ee7a681) corrects a few things required for the S-tag revisions.

funderburkjim commented 2 years ago

The 'S tag improvements' now made. See commit ef22c73.

Note 1: Each line of Sanskrit in the S elements now enclosed within <s> tag. This is so all Sanskrit text, whether in S element or Footnotes or additions/corrections, will be text within an <s> element.

<entry>
<info L="7865" page="5.249" gtypes="S,D,F"/>
<S n="7865">
<s>grAsodgalitasikathasya kariNaH kiM gataM bhavet |</s>
<s>pipIlikA tu tenaiva sakuTumbopajIvati || 7865 ||</s>
</S>

Note 2: a 'Version number' is present in boesp.xml and also in boesp.dtd. The current version number is 1.3.

Note 3: boesp.all_ansi.xml is no longer being updated. The intention is that boesp_utf8.xml will be the starting point for further revisions, and that the 'ansi' version has served its purpose.
@thomasincambodia PLEASE LET ME KNOW IF THIS IS PROBLEMATIC. Since you can view the utf8 forms via Oxygen or Notepad++, I hoped this choice would be ok for you.

maltenth commented 2 years ago

No problem. Everything fine.