Open funderburkjim opened 8 years ago
Each of the verbdata.txt records has the root spelled in two ways:
Refer to verbdata_dupnorm.txt.
It was found that in 19 cases, a given spelling-with-anubandha was associated with more than one spelling-without-anubandha.
The presence of these duplicates provides an obstacle to corresponding the records of verbdata.txt with roots indicated in dictionaries such as that of Monier-Williams.
sutra
numbersRefer to verbdata_dupsutra.txt.
Sutra number in this discussion means an amalgam of the gana
(conjugation class) and (sequence) number
of verbdata.txt records. (This file identifies the fields that comprise a record of verbdata.txt.)
There were 43 cases where a given sutra number appears in more than one record of verbdata.txt.
I presume that some set of parameters derived from the fields of verbdata.txt should identify a particular entity which we call a root
. A priori, I expected that the gana-sequence number (sutra
number) would be such a parameter, but the presence of duplicates shows that it is not. However, the
fact that the number of duplicates (43) is quite small (2% of 2213 records of verbdata.txt) indicates that the sutra number is almost an identifier.
On the other hand, the verb-with-anubandha is also not a unique identifier of the cases of verbdata.txt.
Is there a generally accepted dhatu identifier ? Is this identifier present in verbdata.txt?
Good to see cleaning up of $verbdata. I did some cleanup manually as and when I came across the errors. But systematic study like this will definitely clean up in big way.
If you do some additional analysis, keep me posted. I will correct in my data also.
It was found that in 19 cases, a given spelling-with-anubandha was associated with more than one spelling-without-anubandha.
Majority of them seem to be errors. Will keep you posted when I correct these entries in function.php. You can regenerate later on.
Is there a generally accepted dhatu identifier ?
From my experience, it would be gana,sutranumber,pada,iDAgama,meaning
.
The reason behind the meaning coming in this is - there are places where different commentators assign a different meaning to the same verb. If we are not able to have a separate verb entry with separate verb number, it is possible that sutranumber is identical, but meaning is separate.
There can be genuine tagging errors of the database maker also. Need to examine these entries individually.
Is this identifier present in verbdata.txt?
Yes, the identifier seems to be present in verbdata.txt.
https://github.com/funderburkjim/elispsanskrit/issues/32#issuecomment-243000152 Corrections started. Changes noted here. For your reference, the base dhAtupATha which Mihail has based his numbers seem to be the following. dhatupatha_svara.pdf Majority of root numbers tally with this.
YimidA!:Bid,mid:Changed to mid. Bid was error. kaWi!:utkaRW,kaRW:Removed ut. It was upasarga. o!laqi!:olaRq,laRq:Removed o. vella!:vell,vehl:Separate verbs vella! and vehla!. Correction to vehla! pasi!:paMS,paMs:Separate verbs pasi! and paSi!. Correction to paSi! barha!:barh,varh:Separate verbs barha! and varha!. Correction to varha! bfhi!:bfMh,vfMhःSeparate verbs bfhi! and vfhi!. Correction to vfhi! DUpa!:Dup,DUp:Tricky. There are two verbs on the same number. Alternate forms. See image. Right now changing to Dupa!. Will have to take a call. vehf!:beh,veh:Same as above. Changed to behf! DU:Du,DUःSame as above. Changed to Du. mana!:man,mAnःChanged to mAna! IKi!:IK,INK:Changed to IKa! pelf!:pall,pel:Changed to palla! taqa!:taq,taRq:Changed to taq vasa!:vas,vasa:Changed to vas aqqa!:aqq,adq:Changed to adqa! visa!:bis,vis:Changed to vis bisa!:bis,vis:Changed to bis mAna!:man,mAn:Changed to mAn
19 entries of 'duplicate verb without anubandha' is corrected in $verbdata now. https://github.com/funderburkjim/elispsanskrit/issues/32#issuecomment-243000513 pending. @funderburkjim will you please regenerate the statistics after this first round of corrections?
I am sure, some increase will be seen in the 'duplicate sutra number' lot after the first round of corrections.
@drdhaval2785 Regenerated
dhatupatha_svara.pdf
- The link to this is a new form to me for GitHub.
https://github.com/funderburkjim/elispsanskrit/files/448794/dhatupatha_svara.pdf When I clicked, it downloaded the file. This must be some GitHub service. Is there a link on how to use the 'files' service?- Dhatupathas are generally associated with some scholar's name, as I understand it. For instance, there is the mADavIyaDAtupAWa, the Westergaard Dhatupatha, probably many other Sanskrit scholars both modern and from antiquity. To which scholar do we attribute dhatupatha_svara.pdf?
When I clicked, it downloaded the file. This must be some GitHub service. Is there a link on how to use the 'files' service?
Drag and drop in the issue text box. Nothing further.
To which scholar do we attribute dhatupatha_svara.pdf?
I seriously do not know. It is available on sanskritdocuments.org I guess. No metadata in the file.
To which scholar do we attribute dhatupatha_svara.pdf?
When I last met Mihas in Moscow he told me that there have been 3 sources. The main source is Katre (https://yadi.sk/i/4kO_OF81uhGer and https://yadi.sk/i/klN3jLERuhGh9) , the others two for reference I do not remember, but one could ask Mihas by mail. We are no more in contact as he is on Ukraine's side (being in Belarussia), I'm - Russia's.
So dhatupatha_svara.pdf was produced by 'Mihas' ?
Sad about the Ukraine issue.
Here are four cases that might need correction in verbdata; (from sanverb_cp_log.txt)
case 1 of duplicate verbdata key: vraRa!.01.0519.P
vraRa!:SabdArTaH:vraR:01:0519:pa:sew:व्र॑णँ॑:277:290:293:vraN1_vraNaz_BvAxiH+SabxArWaH:
vraRa!:SabdArTaH:vraR:01:0519:pa:sew:व्र॑णँ॑:277:290:293:vraN1_vraNaz_BvAxiH+SabxArWaH:
case 2 of duplicate verbdata key: kaWi!.10.0385.U
kaWi!:Soke prAyeRotpUrva utkaRWAvacanaH:kaRW:10:0385:u:sew:क॑ठिँ॑:1362:1378:1415:kaNT2_kaTiz_curAxiH+Soke:
kaWi!:Soke prAyeRotpUrva utkaRWAvacanaH:kaRW:10:0385:u:sew:क॑ठिँ॑:1362:1378:1415:kaNT2_kaTiz_curAxiH+Soke:
case 3 of duplicate verbdata key: DUpa!.10.0303.U
DUpa!:Dupa!' BAzArTaH:Dup:10:0303:u:sew:धू॑पँ॑:1321::1374:XUp2_XUpaz_curAxiH+BARArWaH:
DUpa!:BAzArTaH:DUp:10:0303:u:sew:धू॑पँ॑:1321::1374:XUp2_XUpaz_curAxiH+BARArWaH:
case 4 of duplicate verbdata key: granTa!.10.0362.U
granTa!:banDane:granT:10:0362:u:sew:ग्र॑न्थँ॑:1342,1353:1368:1395,1406:granW3_granWaz_curAxiH+sanxarBe:261
granTa!:sandarBe:granT:10:0362:u:sew:ग्र॑न्थँ॑:1342,1353:1368:1395,1406:granW3_gran```
vraRa! and kaWi!
Duplicates - removed one entry.
DUpa!
This is typical. There are two verbs in the same number. There are some such cases.
I propose to do it 10.0303a
. @funderburkjim what is your take?
granTa!
granTa! banDane is 10.0362. granTa! sandarBe is 10.0375. Corrected in function.php
Re: Drag and drop in the issue text box. Nothing further.
Thanks. Useful idea.
Regarding the DUpa! case, where given sutra has two root forms.
DUpa!:Dupa!' BAzArTaH:Dup:10:0303:u:sew:धू॑पँ॑:1321::1374:XUp2_XUpaz_curAxiH+BARArWaH:
DUpa!:BAzArTaH:DUp:10:0303:u:sew:धू॑पँ॑:1321::1374:XUp2_XUpaz_curAxiH+BARArWaH:
Maybe change these to
Dupa!:BAzArTaH:Dup:10:0303:u:sew:धु॑पँ॑:1321::1374:XUp2_XUpaz_curAxiH+BARArWaH:
DUpa!:BAzArTaH:DUp:10:0303:u:sew:धू॑पँ॑:1321::1374:XUp2_XUpaz_curAxiH+BARArWaH:
I would hold off distinguishing these further by 10:0303a on one of them
since the 10.0303 is probably a key
into a printing of the dhatupatha, and adding an 'a' would confuse the construction of this key.
And there's also the fact that the 0303 is a sequence number.
Further comment/question re DUpa!
In looking at dhatupatha_svara.pdf, there are many cases like 10:0303, in the sense of having the form
gana.number root (root1)
10:0305 cIva! (cIba!)
cIva!:BAzArTaH:cIv:10:0305:u:sew:ची॑वँ॑:1321::1374:cIv2_cIvaz_curAxiH+BARArWaH:
01:0105 zvaska! (zvazka!)
zvaska!:gatyarTaH:svazk:01:0105:A:sew:ष्व॑स्कँ॒:::::
(Numerous other examples)
So, if DUpa! were handled in verbdata like those other two instances, then there would be only ONE record for it in verbdata.
This is just an observation regarding some formal comparisons. I don't know the significance of all the pieces, so do not have a definite opinion
@funderburkjim,
It actually transpires that there are many such cases in dhatupatha_svara.pdf. And not all of them were given a separate headword status e.g. there are no zvazka! or cIba! verbs in database. So, best is to remove the Dup from database for consistency.
Regenerated the data.
In preparation for comparing the pysan conjugation algorithms with those of SanskritVerb, there is some analysis of the
$verbdata
element of SanskritVerb programfunction.php
.This data is extracted as file verbdata.txt.
Analysis of verbdata was made for duplicates, in two ways.
In both cases, the significance (if any) of these duplicates is a question to me.