drdhaval2785 / SanskritVerb

Verb declention for Sanskrit
41 stars 8 forks source link

Publish generation steps along with generated forms #937

Closed vvasuki closed 5 years ago

vvasuki commented 8 years ago

www.sanskritworld.in/sanskrittool/SanskritVerb/generatedforms/verbforms.tar.gz is wonderful! It might be of great use to generate and dump the sequence of sUtra applications that result in each final form as well. Furthermore, it would be great if the output were in devanAgarI so that humans could read and search easily.

drdhaval2785 commented 8 years ago

Sequence of sUtra application

Will have to wait for some time. Currently we have not analyzed extensively the intermediate sUtra applications. We have tested only the final outcome. It is absolutely possible as of now to dump the data, but it may be erroneous. Will wait for a month or so for my partner Shivakumari to look at individual forms step by step. Once we are satisfied regarding the application / non application of the rules, it would be available. Right now I automatically create a copy the HTML generated on the web page on one local file on disk. It would be more readable for a human. For machines, database may be created.

it would be great if the output were in devanAgarI

XML parsing with devanagari tags is an issue. lxml etc don't support parsing Devanagari tags properly. So as of now I have settled with a CSV file with verbform,verb,lakAra,suffix,verbnumber format. e.g. अंसयति,अंस,लट्,तिप्,10.0460. Download file (1.6 MB) http://www.sanskritworld.in/public/sanskrittool/SanskritVerb/generatedforms/verbformsdeva.tar.gz

drdhaval2785 commented 8 years ago

@vvasuki This is near completion. Please find attached a JSON file. Such file can be created for all verbs, all lakAras.

trialjson.txt

drdhaval2785 commented 8 years ago

Currently I am planning to create a firefox / chrome plugin with help of @mbykov who maintains morpheus plugin. He wanted to get some JSON file for the step by step generation. I have provided the same to him. Once this plugin gets functional, one can click on any form and get step by step derivation.

vvasuki commented 8 years ago

That is splendid @drdhaval2785 ! I understand it will be slightly more work, but, for it to be truly useful to end users of offline dictionary programs (on phones and such), it would have to be better formatted - in some simple format like csv or babylon with the first column (key) being the end form, and the second column being the derivation sequence (preferably not just the sUtra numbers, but their text as well); and it would be super if it were in devanAgarI.

funderburkjim commented 8 years ago

Regarding trialjson.txt.

This file would be better human-readable if you generated it with a PRETTY_PRINT option There is such an option both in python and in php when serializing a data structure into json string.

The resulting pretty-printed file will be just as easy for a program to load as a file generated without pretty-printing, and also the programmatic 'reader' does not have to be told whether the file was pretty-printed or not.

funderburkjim commented 8 years ago

There is good support for reading Json-formatted data in any programming language, including Javascript, Python, PHP, Ruby, Java, etc. Json is much more flexible than csv, since hierarchical data structures (arrays, objects) are representable in Json, but csv requires a 'flat' structure.

vvasuki commented 7 years ago

@drdhaval2785 Could you go ahead and generate derivation steps in json for all dhAtus and all lakAra-s (which you and others find convenient)? It's possible that I will find time to convert them to a format suitable for offline dicts.

Also, the sUtra numbers refer exactly to the table listed in https://github.com/drdhaval2785/SanskritVerb/blob/master/Data/sUtrANi.xlsx right? (There are slightly varying sUtrapATha-s which is one reason why I was nervous of interpreting your condensed output myself)

drdhaval2785 commented 7 years ago

Not much progress here in last three four months @vvasuki . It turned out to be a deeper R&D than what I thought it would be. In deep freeze as of now.

vvasuki commented 6 years ago

A comment to bring this up in the recent issues list: @drdhaval2785 says this is close at hand.

vvasuki commented 6 years ago

Responding to -

@vvasuki Is this what you were asking for in https://github.com/drdhaval2785/SanskritVerb/issues/1056? The csv file can be converted to any form you want.

@avinashvarna - Your csv is useful - let's retain it. However, I was hoping for a tsv file columns like the following (with empty strings for non existent forms and optional forms separated something like commas): धातुः लट्-प्र-१-परस्मै लट्-प्र-२ लट्-प्र-३ लट्-म-१ लट्-म-२ लट्-म-३ … लिट्-उ-३-आत्मने लिट्-उ-३-आत्मने भू भवति भवतः भवन्ति …

If this is simple to produce with your setup, it would be very concise and simple to use.

avinashvarna commented 6 years ago

Here you go.

vvasuki commented 6 years ago

Danke! The same in Google docs here. Why are so many forms missing, @drdhaval2785 ?

vvasuki commented 6 years ago

आह् - अविनाशस्य तन्त्रांशे दोषं शङ्के।

avinashvarna commented 6 years ago

क्षम्यताम् । headers उत्पादने दोषोऽस्ति । परिष्करोमि ।

avinashvarna commented 6 years ago

अधुना परिष्कृतम् । पुनरपि अत्रैव लभ्यते ।

avinashvarna commented 6 years ago

I was thinking it would be nice if each entry was hyperlinked to the derivation, so that someone can click on a form and get the derivation. Towards that end, I've updated the UI to allow query parameters (a side benefit is that the UI link can be shared). E.g. https://avinashvarna.github.io/prakriya/?input=agacCham&input_trans=itrans&output_trans=devanagari (There is a minor bug in the link to results section that I need to fix).

I suppose tsv could support hyperlink, but it would look ugly. Could directly generate a spreadsheet instead. @vvasuki How strict are you on tsv format?

vvasuki commented 6 years ago

@avinashvarna splendid idea! Let's produce a tsv with hyperlinks (hyperlinking just the first form's derivation in case of multiple forms). Don't worry about tsv ugliness - a tsv with so many columns is already ugly. It suffices if the tsv:

avinashvarna commented 6 years ago

Looks like it would require the addition of =HYPERLINK(,). Will this make the tsv less useful for consumption by other programs?

vvasuki commented 6 years ago

Hmm - Let's just generate both with and without =HYPERLINK(,). That seems simplest (from consumer viewpoint) without adding too much complexity (from producer viewpoint)

vvasuki commented 6 years ago

Looks very good now. @drdhaval2785 - since you are member of the maximum sanskrit-phile groups, share the spreadsheet.

vvasuki commented 6 years ago

Saving the links:

https://docs.google.com/spreadsheets/d/1znouGQRk08_JwbuvoMVmuEIda6BOaFJD5wCPJRYYIr4/edit#gid=0

https://docs.google.com/spreadsheets/d/1rO05Aw5lDnkPaP877iGUlL6zIRPWYedetpxw3At87FA/edit?ts=5a581971#gid=1664632017

drdhaval2785 commented 6 years ago

The TSV has too many columns. Can I suggest that we keep tense in second column e.g BU law ..... BU liw .....

That way we will have only 18 columns.

vvasuki commented 6 years ago

A separate column for grouping by lakAra seems like a good idea as far as the TSV with hyperlink (meant for humans to read) is concerned.

avinashvarna commented 6 years ago

Ok. How about now? Just 9 columns, which is very easy to read and no need to scroll to see आत्मनेपदी forms. Spreadsheet is the same as before: https://docs.google.com/spreadsheets/d/1rO05Aw5lDnkPaP877iGUlL6zIRPWYedetpxw3At87FA/edit?usp=sharing

vvasuki commented 6 years ago

LGTM (= looks good to me) - except I'd prefer ॰ (devanAgarI abbreviation sign) to .

drdhaval2785 commented 5 years ago

The system works on basis of API now. Further, there is another much used interface to this end. http://ashtadhyayi.com/dhatu/ which is quite intuitively arranged. So this issue has lived its life and fulfilled expected results.