Closed drdhaval2785 closed 8 years ago
Maybe you can regenerate the generated forms?
This file shows 19 cases where the information in the generated forms could not be matched by the current verbdata. I think verbdata may be ahead of generated forms, i.e., generated forms is out of sync with some recent changes in verbdata.
Also, separately, noticed a likely mis-spelling in verbdata: I think verbwithoutanubandha should be
saBAja
rather than samAja
:
saBAja:prItidarSanayoH prItisevanayorityeke:samAja:10:0429:u:sew:स॑भा॑ज॑::::saBAja1_saBAja_curAxiH+MISSING:
@funderburkjim
I would propose thus. You work with current version of generatedforms. Note down the errors. Once you have completed comparision of pysan with SanskritVerb one round, I will regenerate. When I regenerate / do some modification in SanskritVerb algorithm / database, there may be other verbs which may get affected. It is better to do all the corrections and regenerate generatedforms.xml only after one round of corrections are already incorporated. So let us treat generatedforms.xml as version 1. Once your all corrections (based on current version) are incorporated in SanskritVerb / pysan, I will generate a version 2.
Then you rerun your comparision statistics once again and restart the game once again. e.g. 'Apa' issue - Right now only a rare form of the SanskritVerb form has this appendage. Once I do change in this algorithm, there would be a lot more new 'Apa' ending stuff like 'kaTApayati' etc. So, whether this would be useful or counterproductive, only time will tell. Right now, I tend to keep generatedforms stationary and do changes only in $verbdata. Once both of us are satisfied that first round of all corrections are made, I will rerun and generate version 2.
saBAja rather than samAja
Done.
And one good news. With PHP7 in ubuntu environment, each verb takes roughly 1 sec (all 10 tenses / moods). So the time duration for generating generatedforms.xml has come down from 1 day to 1 hour. Now it is much more amenable to changes in $verbdata.
@drdhaval2785 From your comments above, I understand that you are reluctant to keep the generated forms always in sync with verbdata.
However, I think it would simplify the comparison process for these to be in sync.
One suggestion would be for me to do interim regenerations of forms (when you've made changes to verbdata). I could do this on a local branch, which would have no impact on the GitHub SanskritVerb.
I looked in the 'scripts' folded of SanskritVerb, but was not sure of how to recreate generatedforms.xml.
So the process would be:
I generated new forms. Would be uploading tomorrow. As I noted, time taken was large earlier. Now it is quite small. So, I will be keeping it in sync letd say weekly.
How do you generate new forms? Inquiring minds want to know :)
sh wrongformfinder.sh
This generates two files generatedforms.xml and suspectforms.txt
The suspectforms.txt file is the file where fishy forms are stored.
Now there is a script sh verblistredo.sh
which regenerates all the verb lists based on changes in $verbdata in verbdata.php
file.
https://github.com/funderburkjim/elispsanskrit/blob/master/pysanskritv1/roots/sanverb_conjtab_cp.txt has total of 74 entries excess in $verbdata in excess of generatedforms and 29 cases which appear only in $verbdata. See this issue.
@funderburkjim Majority of these were deleted from wrongformfinder.sh manually, because of some generation issues.
A detailed analysis is needed.