ambuda-org / vidyut

Infrastructure for Sanskrit software. For Python bindings, see `vidyut-py`.
48 stars 21 forks source link

Compare the dump data of vidyut with dump data of ashtadhyayi.com #72

Closed drdhaval2785 closed 7 months ago

drdhaval2785 commented 1 year ago

70 and #71 lead me to raise this query.

In majority of derivations, vidyut is doing great. In some minority of cases like this, there may be wrong forms.

A systematic way of analysis would be to compare the dump of both vidyut and ashtadhyayi.com to analyze the differences. We can iron out the differences. There may have been some error in implementing some sUtra or in understanding grammar commentaries at some place.

Kind attention @akprasad please.

akprasad commented 7 months ago

@drdhaval2785 , +cc @neeleshb

Setup

I've written a simple analysis script, which you can find here.

Summary

Here's the full diff:

compare.log

In each mismatch, the first row contains forms in Vidyut that are missing from ashtadhyayi.com, and the second row contains the opposite.

Specific differences

I'm still looking through the results, but here are some patterns that I believe are in Vidyut's favor:

Vidyut doesn't have any systemic errors that I can see, though I'm sure many of the differences are minor errors that need fixing, such as खवान vs. खौनीहि. I am also limited in my own grammatical knowledge, so I cannot fully access what I'm seeing without more research.

I'll keep this issue open to track the differences. @drdhaval2785 and @neeleshb, please look through compare.log at your leisure and help me understand how some of these differences should be resolved.

akprasad commented 7 months ago

Also +cc @vipranarayan14

vipranarayan14 commented 7 months ago

As per my experience (through manually checking and working with both), Vidyut's data is more accurate compared to ashtadhyayi.com's data. But Vidyut is also missing few alternate/special forms and has a few incorrect forms. The most accurate and complete data of verb forms I have come across is that of SanskritAbhyas.in. I think, it would also be useful if we can get the data from them and compare with that also. The main challenge would be matching the dhatus.

Anyway, I will go through the log (slowly) and report if anything needs to be corrected in Vidyut.

Thank you @akprasad for ccing me.

akprasad commented 7 months ago

After syncing with @neeleshb over email, I've learned that the forms on ashtadhyayi.com come from multiple different sources, including SanskritVerb. Rather than compare against ashtadhyayi.com directly, I think it will be more productive to continue comparing against forms from the Siddhanta Kaumudi and other traditional literature.