Closed vvasuki closed 5 years ago
Sequence of sUtra application
Will have to wait for some time. Currently we have not analyzed extensively the intermediate sUtra applications. We have tested only the final outcome. It is absolutely possible as of now to dump the data, but it may be erroneous. Will wait for a month or so for my partner Shivakumari to look at individual forms step by step. Once we are satisfied regarding the application / non application of the rules, it would be available. Right now I automatically create a copy the HTML generated on the web page on one local file on disk. It would be more readable for a human. For machines, database may be created.
it would be great if the output were in devanAgarI
XML parsing with devanagari tags is an issue. lxml etc don't support parsing Devanagari tags properly.
So as of now I have settled with a CSV file with verbform,verb,lakAra,suffix,verbnumber
format.
e.g. अंसयति,अंस,लट्,तिप्,10.0460
.
Download file (1.6 MB)
http://www.sanskritworld.in/public/sanskrittool/SanskritVerb/generatedforms/verbformsdeva.tar.gz
@vvasuki This is near completion. Please find attached a JSON file. Such file can be created for all verbs, all lakAras.
Currently I am planning to create a firefox / chrome plugin with help of @mbykov who maintains morpheus plugin. He wanted to get some JSON file for the step by step generation. I have provided the same to him. Once this plugin gets functional, one can click on any form and get step by step derivation.
That is splendid @drdhaval2785 ! I understand it will be slightly more work, but, for it to be truly useful to end users of offline dictionary programs (on phones and such), it would have to be better formatted - in some simple format like csv or babylon with the first column (key) being the end form, and the second column being the derivation sequence (preferably not just the sUtra numbers, but their text as well); and it would be super if it were in devanAgarI.
Regarding trialjson.txt.
This file would be better human-readable if you generated it with a PRETTY_PRINT option There is such an option both in python and in php when serializing a data structure into json string.
The resulting pretty-printed file will be just as easy for a program to load as a file generated without pretty-printing, and also the programmatic 'reader' does not have to be told whether the file was pretty-printed or not.
There is good support for reading Json-formatted data in any programming language, including Javascript, Python, PHP, Ruby, Java, etc. Json is much more flexible than csv, since hierarchical data structures (arrays, objects) are representable in Json, but csv requires a 'flat' structure.
@drdhaval2785 Could you go ahead and generate derivation steps in json for all dhAtus and all lakAra-s (which you and others find convenient)? It's possible that I will find time to convert them to a format suitable for offline dicts.
Also, the sUtra numbers refer exactly to the table listed in https://github.com/drdhaval2785/SanskritVerb/blob/master/Data/sUtrANi.xlsx right? (There are slightly varying sUtrapATha-s which is one reason why I was nervous of interpreting your condensed output myself)
Not much progress here in last three four months @vvasuki . It turned out to be a deeper R&D than what I thought it would be. In deep freeze as of now.
A comment to bring this up in the recent issues list: @drdhaval2785 says this is close at hand.
Responding to -
@vvasuki Is this what you were asking for in https://github.com/drdhaval2785/SanskritVerb/issues/1056? The csv file can be converted to any form you want.
@avinashvarna - Your csv is useful - let's retain it. However, I was hoping for a tsv file columns like the following (with empty strings for non existent forms and optional forms separated something like commas): धातुः लट्-प्र-१-परस्मै लट्-प्र-२ लट्-प्र-३ लट्-म-१ लट्-म-२ लट्-म-३ … लिट्-उ-३-आत्मने लिट्-उ-३-आत्मने भू भवति भवतः भवन्ति …
If this is simple to produce with your setup, it would be very concise and simple to use.
Here you go.
Danke! The same in Google docs here. Why are so many forms missing, @drdhaval2785 ?
आह् - अविनाशस्य तन्त्रांशे दोषं शङ्के।
क्षम्यताम् । headers उत्पादने दोषोऽस्ति । परिष्करोमि ।
अधुना परिष्कृतम् । पुनरपि अत्रैव लभ्यते ।
I was thinking it would be nice if each entry was hyperlinked to the derivation, so that someone can click on a form and get the derivation. Towards that end, I've updated the UI to allow query parameters (a side benefit is that the UI link can be shared). E.g. https://avinashvarna.github.io/prakriya/?input=agacCham&input_trans=itrans&output_trans=devanagari (There is a minor bug in the link to results section that I need to fix).
I suppose tsv could support hyperlink, but it would look ugly. Could directly generate a spreadsheet instead. @vvasuki How strict are you on tsv format?
@avinashvarna splendid idea! Let's produce a tsv with hyperlinks (hyperlinking just the first form's derivation in case of multiple forms). Don't worry about tsv ugliness - a tsv with so many columns is already ugly. It suffices if the tsv:
Looks like it would require the addition of =HYPERLINK(,
Hmm - Let's just generate both with and without =HYPERLINK(,). That seems simplest (from consumer viewpoint) without adding too much complexity (from producer viewpoint)
Looks very good now. @drdhaval2785 - since you are member of the maximum sanskrit-phile groups, share the spreadsheet.
The TSV has too many columns. Can I suggest that we keep tense in second column e.g BU law ..... BU liw .....
That way we will have only 18 columns.
A separate column for grouping by lakAra seems like a good idea as far as the TSV with hyperlink (meant for humans to read) is concerned.
Ok. How about now? Just 9 columns, which is very easy to read and no need to scroll to see आत्मनेपदी forms. Spreadsheet is the same as before: https://docs.google.com/spreadsheets/d/1rO05Aw5lDnkPaP877iGUlL6zIRPWYedetpxw3At87FA/edit?usp=sharing
LGTM (= looks good to me) - except I'd prefer ॰ (devanAgarI abbreviation sign) to .
The system works on basis of API now. Further, there is another much used interface to this end. http://ashtadhyayi.com/dhatu/ which is quite intuitively arranged. So this issue has lived its life and fulfilled expected results.
www.sanskritworld.in/sanskrittool/SanskritVerb/generatedforms/verbforms.tar.gz is wonderful! It might be of great use to generate and dump the sequence of sUtra applications that result in each final form as well. Furthermore, it would be great if the output were in devanAgarI so that humans could read and search easily.