funderburkjim / MWderivations

Derivations of headwords in the Monier-Williams (1899) dictionary
1 stars 1 forks source link

12 subtypes of `:su:` #13

Open gasyoun opened 3 years ago

gasyoun commented 3 years ago

If I search for :ati: there is only one instance. Suppose I want to analyse atikopasamanvita.

0284:ati:+kaṭhora +katha +karṣaṇa +kalyam +kānta +kāya +kiriṭa +kirīṭa +kutsita +kulva +kṛcchra +kṛta +kṛśa +kṛṣṇa +kruddha +krudh +kruṣṭa +khara +gaṇḍa +gandha +gandhālu +garīyas +garvita +gahana +gāḍha +gārgya +guṇa +gupta +guru 

sucakra

If I search for :su: there are 12 different ones with 1723 subentries with su-. Why there are split, @funderburkjim ?

0120:su:+ūti
0082:su:+gaṇ
0073:su:+cakṣas
0239:su:+tanu
0088:su:+nat
0352:su:+pakṣa
0105:su:+makha
0320:su:+yajus

Because of anusvāra? Even if split originally, for the purpose of analysis does it makes sense to keep them apart?

funderburkjim commented 3 years ago

Probably what I did was to go sequentially through the dictionary, and collect all the headwords with samAsa children.
Look at the 'su' group with 'su-gaR'. Compare to page 1222.

Then see the 'su-cakzas' group on p. 1223.

The compounds are 'children' of different entries for 'su'.

Probably the other 'su-' groups are similarly explained.

For some purposes it would make sense to aggregate these 'su-' compounds.

By the way, I like the display above -- Is that one you developed?

gasyoun commented 3 years ago

was to go sequentially through the dictionary, and collect all the headwords with samAsa children.

Right, that's what it seems. But seems that these subgroups appear only in upasargas.

What would be required to have a united version of them?

By the way, I like the display above -- Is that one you developed?

No, it's your file. https://github.com/funderburkjim/MWderivations/blob/master/compounds/compounds.html

funderburkjim commented 3 years ago

What would be required to have a united version ?

From compounds.txt, a program could create compounds-united.txt.

This would replace all the ':su:' lines with just one ':su:' line

And similarly for all the other 'prefixes'.

gasyoun commented 3 years ago

From compounds.txt, a program could create compounds-united.txt.

Is there any idea how to automate it? Because if you update the source (and after the AB changes are implemented), my non-smart gluing will unglue again.

The parent may be marked as a VERB -- 1247 of these. Clearly the children in such cases are not compounds, but the current (H3) markup of the children is the same as for samasas.

They are not marked in anyway now? Should I remake the formatting of dhātu entries, so they do not give false positives here?

H3 headwords can also have H4 children, but this table ignores these

Why? Because not encoded in easy to grasp manner?

No sandhi, easy: akṣara+kara = akṣarakara

Sandhi involved: akṣarā@kṣara = akṣarākṣara

But does entry: akṣāra +lavaṇa +lavaṇā@śin

+lavaṇā@śin = akṣāralavaṇāśin

sidn

funderburkjim commented 3 years ago

akzAralavaRa

funderburkjim commented 3 years ago

How to automate ?

A program is needed to create compounds_united.txt from compounds.txt; let's call that program 'compounds_united.py' (not yet written).

Then, if compounds.txt is revised, we can run the program to update compounds_united.txt.

This would be part of a larger redo script. It would come after the steps to update compounds.txt, as described in https://github.com/funderburkjim/MWderivations/blob/master/compounds/readme.txt

compounds.txt depends on step4/all.txt. And there is a redo script for step4.

etc. etc. That's the way that things can be automated. Based on recent look at MWderivations, it looks fairly straightforward to write a 'master redo script' that would update everything that needs to be updated in MWderivations. Looks like MWderivations is fairly well organized to be updateable.

gasyoun commented 3 years ago

let's call that program 'compounds_united.py' (not yet written).

Let's call it for life. After magic I've seen it should not be rocket science, thanks.

This would be part of a larger redo script.

That is why I ask for you and do not just join them on my end.

Looks like MWderivations is fairly well organized to be updateable.

Please, it could reused in that case in the Reverse Sanskrit Dictionary in that case, as it would be still alive. Esp. after compounds_united.py

gasyoun commented 2 years ago

Looks like MWderivations is fairly well organized to be updateable.

Please give compounds another chance @funderburkjim ))

gasyoun commented 2 years ago

ffdsddfsdfs

MW has ghrāṇa—cakṣus

But your file, Jim, has 0010:ghrāṇa:+cakṣuś +ja +tarpaṇa +duḥkha-dā +pāka +puṭaka +bila +śravas +skanda ghrāṇe@ndriya

gasyoun commented 2 years ago

@funderburkjim 1) can the .txt use TAB instead of SPACE as a limitator? Otherwise because of some entries containing space I get in trouble. 2) we have a normalised list of headwords. Can we have a normalised list of samasa elements and samasa output as well? What would be the way to solve it? 3) the list contains around 111k samasas and some, like a-prameya missing in the list. Any reason why?

gasyoun commented 1 year ago

@funderburkjim I would love to continue the research, but impossible without your help.