Open bdhrs opened 3 years ago
Venerable @bdhrs, @parthopdas: I took a look and I think this is a problem with the underlying XML data that is analyzed by the DPR. For example, bhedāapatti
appears in the XML for Book 4 of the Aṅguttara (Mūla): https://github.com/digitalpalireader/digitalpalireader/blob/master/tipitaka/my/a4m.xml#L2425, whereas it should be bhedaāpatti
. How were the XML files generated?
I don't think this is a bug with the way DPR is displaying results, as I am able to search for words that contain aā
: https://www.digitalpalireader.online/_dprhtml/index.html?feature=search&type=0&query=\wa%C4%81\w&MAT=m&set=n&book=1&part=1&rx=true. Please let me know if I have misunderstood the original issue!
@rrogowski afaik it was a custom process used by v @yuttadhammo years back. i think the VRI corpus was the base.
you are most likely right in your assessment and if so we'll need to fix the xml files with the caveat in the PS. do you know what the corresponding VRI texts say? https://tipitaka.org/romn/
i think this is a good discussion for the dpr channel.
PS: this is item 6 in https://discord.com/channels/780067275008376862/786141053090660362/822972211711180801
Just want to note that, per our discussion on Discord, we will be using the VRI as the source of truth for the DPR Burmese texts. By tackling this long-term problem, we will fix the short-term problem described in this issue. The next steps seem to be:
@rrogowski I will also add that please make a judgment call on this.
I certainly prefer we fix this problem once and for all as it opens a bunch of possibilities and aligns well with overall DPT roadmap.
However you're doing the work and I'd rather you do stuff that interests you than chase some random vision / roadmap.
All work you do in DPR is impactful by definition.
@parthopdas I'm happy to continue working on this issue!
I manually browsed each commit in the Git history for the Myanmar Tipitaka XML files (there were only a couple dozen total). Of these commits, the following contain Pali fixes.
Apr 20, 2020
Aug 28, 2014
Aug 25, 2012
Sep 23, 2011
Sep 3, 2011
There are few enough typos that I think we can fix them manually in the VRI texts. So here's what I was thinking about doing from this point forward:
āa
and aā
..gitignore
. This will help prevent the case where changes are accidentally made to the DPR XML files and not the VRI texts (single source of truth).What do you think?
Two of the recurring answers when I ask Sri Lankan and Burmese monks, "Why don't you use DPR?" are:
What you are suggesting would sort out the main problem.
As far as single source of truth goes, that must be VRI repository <snipped by partho, details in private on discord>
On Sat, 3 Apr 2021 at 04:01, Roman Rogowski @.***> wrote:
@parthopdas https://github.com/parthopdas I'm happy to continue working on this issue!
I manually browsed each commit in the Git history for the Myanmar Tipitaka XML files https://github.com/digitalpalireader/digitalpalireader/commits/95de827e624a5c41ad07a8552939cd366a95d43b/DPRMyanmar/content/xml (there were only a couple dozen total). Of these commits, the following contain Pali fixes.
Apr 20, 2020
- e11fb8e
diff-7fd3c2f1e3b8035cad22d5daa19e9f50dd082f110fb69639c1ef73bb982a3476
- 7b41a4c
diff-7fd3c2f1e3b8035cad22d5daa19e9f50dd082f110fb69639c1ef73bb982a3476
- 47a2aee
diff-7fd3c2f1e3b8035cad22d5daa19e9f50dd082f110fb69639c1ef73bb982a3476
Aug 28, 2014
- 8200f89
diff-e005e3d277ac5cce5352ad228f3e0f4b2a2736380b91d556c4452735393377f8
Aug 25, 2012
- 2629c4c
diff-e005e3d277ac5cce5352ad228f3e0f4b2a2736380b91d556c4452735393377f8
Sep 23, 2011
- e538cb7
diff-e005e3d277ac5cce5352ad228f3e0f4b2a2736380b91d556c4452735393377f8
Sep 3, 2011
- d39e71c
diff-e005e3d277ac5cce5352ad228f3e0f4b2a2736380b91d556c4452735393377f8
There are few enough typos that I think we can fix them manually in the VRI texts. So here's what I was thinking about doing from this point forward:
- Submit a PR to VRI fixing the typos identified in the commits above. (I'm assuming we will want corrections to be pushed here moving forward, so that we can maintain a single source of truth. In turn, the XML files for the DPR will be auto-generated from the VRI texts.)
- Begin working on a script to generate the Myanmar Tipitaka XML files for the DPR from the VRI texts. Ideally, the auto-generated results should closely match the existing DPR XML files in their current state, with the exception of known discrepancies such as confusing āa and aā.
- Create a PR with the regenerated XML files, which would in turn resolve this short-term issue.
What do you think?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/digitalpalireader/digitalpalireader/issues/288#issuecomment-812741094, or unsubscribe https://github.com/notifications/unsubscribe-auth/APMIMA724ZCGVNOROYRIR7TTGZAUDANCNFSM4TR2CAPA .
There are some mixups in the texts with compounds containing aaa i.e. āa or aā.
DPR renders everything uniformly as āa
Sometimes this is correct e.g. mahāaggikkhandho
and sometimes incorrect e.g. bhedāapatti, majjhimāagame, where it should be aā.
There are only a few words in the vinaya and sutta piṭaka which contain this particular compound, easy enough to repair by hand.
but thousands in the abhdidhamma and commentaries.
AC: