Suggested changes in boesp-1_ansi, Part 4

funderburkjim commented 3 years ago

remove one underscore

@thomasincambodia in D697, please remove the '_'

OLD:  <S> avisaMvAdako dakSaH kRtajn_o matimAnRjuH
NEW: <S> avisaMvAdako dakSaH kRtajno matimAnRjuH

Reason: This also has to do with invertibility of transcoding between hk and slp1.

In 'slp1' the period character represents danda.

In your coding of Sanskrit in boesp, you use the vertical bar | for danda.

However, the {#X#} markup is used for Sanskrit in text fragments where there is a mixture of Devanagari words. Such text can appear in any of the sections EXCEPT the S (sanskrit verse) sections.

And in these {#X#} there often appears the period character (along with the occasional |).

If we leave these period characters in {#X#} HK text, then both '.' and '|' will appear in the transcoded SLP1. So, when we convert back from SLP1 to HK, something will be wrong -- e.g., the original periods in X will now be vertical bar.

The ultimate solution is to modify all the {#X#} fragments so that the resulting 'X' has no periods. However, this is quite tedious, so I made a temporary 'fix' in transcoding to change all the periods in 'X' to the underscore character. So for instance {#rAmaH Agacchati. rAmaH pazyati.#} in HK is transcoded to {#rAmaH AgacCati_ rAmaH paSyati_#} in slp1. Since '' has no meaning in either HK or SLP1, the reverse transcoding from SLP1 to HK uses the special transcoding of '' back to '.'.

Obviously, this is not the final solution, since we don't want the '_' in the SLP1.

The real solution is to change boesp to pull the periods out of X. Thus boesp would be changed to {#rAmaH Agacchati#}. {#rAmaH pazyati#}.

But for right now, the '' trick is being used. The only use of '' in boesp-1 is the one shown above. If it is removed as indicated, then the temporary invertibility trick works.

funderburkjim commented 3 years ago

in D2040, move part of <S> to preceding HS

OLD:
<HS>{#khagA vItaphalaM vRkSam#} s. Spruch 

<S> niHsvaM tyajanti gaNikAH. khaNDaH punarapi pUrNaH punarapi khaNDaH punaH 
zazI pUrNAH | saMpadvipadau prAyaH kasyApi nahi sthire syAtAm || 

NEW:
<HS>{#khagA vItaphalaM vRkSam#} s. Spruch {#niHsvaM tyajanti gaNikAH#}.

<S> khaNDaH punarapi pUrNaH punarapi khaNDaH punaH 
zazI pUrNAH | saMpadvipadau prAyaH kasyApi nahi sthire syAtAm ||

This was noticed since there was an unexpected '.' period character in the <S> section.

funderburkjim commented 3 years ago

Several corrections to invalid HK

These were noticed because they broke invertibility in conversion between HK and SLP1

OLD: <F>349) {#CA1N2#}.
NEW: <F>349) CA1N2.
Reason: this is not HK.  The C breaks invertibility.

OLD: <F>1090) PAN4CAT. ed. KOSEG. V, 69. ed. Bomb. 82. SURHA1sh. 72. ²a. {#ICCHATi#}
NEW: <F>1090) PAN4CAT. ed. KOSEG. V, 69. ed. Bomb. 82. SURHA1sh. 72. ²a. {#icchatI#}
p. 208

in 1095 <S> block
OLD: zeraqte
NEW: zerate
Reason:  the 'q' breaks invertibility

in <F>1234)
OLD: {#PrANAMs#}
NEW: {#prANAMs#}

in <S> for D1265, p. 240
OLD: lOyante
NEW: lIyante

in <F>1343)
OLD: {#yadIxxheyt#}
NEW: {#yadIcchet#}

in <S> for D1400
OLD: eKAkSarapradAtAraM
NEW: ekAkSarapradAtAraM
Reason: 'K' breaks invertibility

in <F>1436) p.274
OLD: {#LALaNa, SAmGO, SAmGGE.#}
NEW: {#lalAnA, saMgo, saMgge.#}

in <F>1449), p. 277
OLD: {#zweiten#}
NEW: zweiten
Reason: German word

in <S> block for D1636  p. 314
OLD: vidaqdhati
NEW: vidadhati

<S> block for D1748. p.337
OLD: dugdhaqdA
NEW: dugdhadA

<S> block for D1942, p. 376
OLD: viSayiBNaH
NEW: viSayiNaH
Reason: B breaks invertibility

<S> block for D1988, p. 386
OLD: KSaNe
NEW: kSaNe
Reason: K breaks invertibility

<S> block for D2046, p. 399
OLD: kaNBTAkAnAM
NEW: kaNTakAnAM

<S> block for D2081, p. 407
OLD: phalamiKSudaNDe
NEW: phalamikSudaNDe

<F>2111) p. 414
OLD: {#NatE PAtHE#}
NEW: {#nATe paThe#}

<F>2194) p. 429
OLD: {#ziKSA#}
NEW: {#zikSA#}

funderburkjim commented 3 years ago

correct [Seite..] in {##}

For transcoding, the page-breaks [Seite1.x] should be separated from {##} devanagari: I.e., {#X Seite1.P Y#} -> {#X#} [Seite1.P] {#Y#}

Here are corrections for the 11 particular cases noticed.

Under <F>100)
OLD:  {#yutam, aznanti, na mIno 'pi jJAtvA
      [Seite1.20] vRtavalizamaznAti.#}
NEW: {#yutam, aznanti, na mIno 'pi jJAtvA#}
      [Seite1.20] {#vRtavalizamaznAti.#}

Under <F>149  ([Seite..] in {##}
OLD: {#a- [Seite1.29] tirUpavatI#}
NEW: {#a-#} [Seite1.29] {#tirUpavatI#}

Under <F>549)
OLD: {#na cakre [Seite1.102] 'lpita#}
NEW: {#na cakre#} [Seite1.102] {#'lpita#}

Under <F>635)
OLD:
{#prANAH balaM tatsAdhyAnAM kAryANAM 
AnantaryaM avicche [Seite1.119] dena kAryadhArAmArabhetetyarthaH na dhanAyate 
dhanamAtmano necchati tRSNAM tyajatItyarthaH#}.
NEW:
{#prANAH balaM tatsAdhyAnAM kAryANAM 
AnantaryaM avicche#} [Seite1.119] {#dena kAryadhArAmArabhetetyarthaH na dhanAyate 
dhanamAtmano necchati tRSNAM tyajatItyarthaH#}.

Under <F>1409)
OLD: {#smara [Seite1.269] kati kRtAH#}
NEW: {#smara#} [Seite1.269] {#kati kRtAH#}

Under <F>1681)
OLD: {#svanura- [Seite1.323] kto#}
NEW: {#svanura-#} [Seite1.323] {#kto#}

Under <F>1684. 85)
OLD: {#karmaNAM samavetAnAM [Seite1.324] bahUnAmarthasiddhaye.#}
NEW: {#karmaNAM samavetAnAM#} [Seite1.324] {#bahUnAmarthasiddhaye.#}

Under <F>1943)
OLD: {#gamanaM; [Seite1.377] kiM rAjyaM#}
NEW: {#gamanaM#}; [Seite1.377] {#kiM rAjyaM#}
Note also semi-colon moved outside of closing #}

Under <F>1987)
OLD: {#manasvI kA- [Seite1.386] ryArthI, na gaNayati#}
NEW: {#manasvI kA-#} [Seite1.386] {#ryArthI, na gaNayati#}

Under <F>2024)
OLD: {#agnau, nA- [Seite1.395] gnau#}
NEW: {#agnau, nA-#} [Seite1.395] {#gnau#}

Under <F>2027)
OLD: {#kSatakSAmo, jarAnvi- [Seite1.396] to; prANo#}
NEW: {#kSatakSAmo, jarAnvi-#} [Seite1.396] {#to#}; {#prANo#}

maltenth commented 3 years ago

remove one underscore

@thomasincambodia in D697, please remove the '_'
OLD:  <S> avisaMvAdako dakSaH kRtajn_o matimAnRjuH
NEW: <S> avisaMvAdako dakSaH kRtajno matimAnRjuH
Reason: This also has to do with invertibility of transcoding between hk and slp1.

In 'slp1' the period character represents danda.

In your coding of Sanskrit in boesp, you use the vertical bar | for danda.

However, the {#X#} markup is used for Sanskrit in text fragments where there is a mixture of Devanagari words. Such text can appear in any of the sections EXCEPT the S (sanskrit verse) sections.

And in these {#X#} there often appears the period character (along with the occasional |).

If we leave these period characters in {#X#} HK text, then both '.' and '|' will appear in the transcoded SLP1. So, when we convert back from SLP1 to HK, something will be wrong -- e.g., the original periods in X will now be vertical bar.

The ultimate solution is to modify all the {#X#} fragments so that the resulting 'X' has no periods. However, this is quite tedious, so I made a temporary 'fix' in transcoding to change all the periods in 'X' to the underscore character. So for instance {#rAmaH Agacchati. rAmaH pazyati.#} in HK is transcoded to {#rAmaH AgacCati_ rAmaH paSyati_#} in slp1. Since '' has no meaning in either HK or SLP1, the reverse transcoding from SLP1 to HK uses the special transcoding of '' back to '.'.

Obviously, this is not the final solution, since we don't want the '_' in the SLP1.

The real solution is to change boesp to pull the periods out of X. Thus boesp would be changed to {#rAmaH Agacchati#}. {#rAmaH pazyati#}.

But for right now, the '' trick is being used. The only use of '' in boesp-1 is the one shown above. If it is removed as indicated, then the temporary invertibility trick works.

@funderburkjim the underscore character was a typo for ~ in n~ which I have converted to J.

I have shifted all dots outside: .#} --> #}. But I still have to find out whether there are dots in other positions within Sanskrit passages

maltenth commented 3 years ago

@funderburkjim

{for your edification} I am using the following kedit macro to ferret out those remainig dots. It seems to work

def dotto 'cur col';do 20996;'cl/{#/';'mark box';'cmatch';'mark box';if pos('.',block.8())\='0';then;do;'reset block';'cl-/./';'text ©';end;'reset block'; end

There are 78 occurences in the whole of boesp. I am going to get rid of them.

funderburkjim commented 3 years ago

I had thought most of the periods occur as .#} . Nice if there are only 70+ elsewhere.

maltenth commented 3 years ago

@funderburkjim there are 5 occurrences of three dots . . . which I have replaced with Ansi 133/85

maltenth commented 3 years ago

@funderburkjim I tried to upload a new version boesp-1_ansi.txt but ran into trouble becouse I erroneously has copy the large pdf file of the 2nd volume of boesp to the step0 directory. Later I deleted the file but still got some error messages:

$ git push Enumerating objects: 11, done. Counting objects: 100% (11/11), done. Delta compression using up to 8 threads Compressing objects: 100% (8/8), done. Writing objects: 100% (8/8), 107.79 MiB | 289.00 KiB/s, done. Total 8 (delta 5), reused 0 (delta 0), pack-reused 0 remote: Resolving deltas: 100% (5/5), completed with 3 local objects. remote: error: Trace: e06e07ae8075c024aaad36df5aa826298189e1af67942986cce180878418817b remote: error: See http://git.io/iEPt8g for more information. remote: error: File step0/boesp-2(2.aufl.).pdf is 108.74 MB; this exceeds GitHub's file size limit of 100.00 MB remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com. To https://github.com/funderburkjim/boesp-prep.git ! [remote rejected] main -> main (pre-receive hook declined) error: failed to push some refs to 'https://github.com/funderburkjim/boesp-prep.git'

maybe boesp-1_ansi.txt has been safely pushed

funderburkjim commented 3 years ago

The ellipsis character should be fine.

The 'fail to push' problem resolved, as I recall.

I think this issue can be closed

funderburkjim / boesp-prep