funderburkjim / kosha-dev

Develop xml and html for anekArthaka and samAnArthaka Sanskrit dictionaries
1 stars 1 forks source link

Gender counts (v5) #21

Closed funderburkjim closed 8 months ago

funderburkjim commented 9 months ago

Preliminary analysis of the 'gender' information in abch.

This is prompted by this comment.

gender_list.txt provides

@drdhaval2785 please provide corrections/improvements to the expansions. Expansions are needed for the 4 ? instances.

funderburkjim commented 9 months ago

The case of दाराः-स्त्रीब

The 'gender' field for dArAH in abch is 'strIba'

As we know, the grammatical gender of is masculine, and in the plural the 'meaning' is 'wife' which is semantically feminine.
I presume that the locative plural is दारेषु (masculine declension for nouns ending in 'a'), rather than दारासु (fem. declension for nouns ending in 'A').

If this presumption holds, then the meaning of 'gender' in abch differs from the 'grammatical gender' (masculine plural).

There are 21 marked in abch as strIba.
For SivamAtaraH-strIba, the 'grammatical gender' agrees with the 'abch' gender (both feminine).

So maybe dArAH-strIba should be marked as dArAH-puMba (i.e. abch gender should = grammatical gender)?

drdhaval2785 commented 9 months ago

dArAH-strIba is now corrected to dArAH-puMba

drdhaval2785 commented 9 months ago

Rest 21 examined and found in order. There is also a case of daSAH-puMstrIba. It is also in order.

drdhaval2785 commented 9 months ago

Expansions provided for missing 3 ?s and removed one superfluous ? mark.

funderburkjim commented 8 months ago

klIba

The gender abbreviation क्ली stands for क्लीब and क्लीब is also used as an abbreviation for क्लीब-बहुवचन. Maybe use क्लीबहु as abbreviation for क्लीब-बहुवचन ?

drdhaval2785 commented 8 months ago

I dont mind this change. The earlier choice was for faster typing. Now, it does not matter. It can be kept klIbahu.

funderburkjim commented 8 months ago

Will you change the abch1.txt file (in v5.1)?

drdhaval2785 commented 8 months ago

I thought it was about display. If you want to change it at data level, I will do it.

funderburkjim commented 8 months ago

I vote for data level change.

The reasoning behind a choice of where to make this change may be subtle. To any upstream user (such as make_xml.py or basicadjust.php), the gender is just text that appears in a particular spot within the input. It seems odd that this user should make a spelling change of text from klIba to klIbahu.
However, this dividing line is murky. Think of all the changes make_xml introduces, such as change from {#X#} to <s>X</x> for Devanagari text. Why does this change first appear in xxx.xml, rather than being done directly in xxx.txt?

Over time, what criterion should be used to determine where a change is to be made (such as xxx.txt or xxx.xml or xxx.html)?

Sounds like there might be some interesting theoretical basis to which the chosen criterion should conform.

But in the absence of such theory, I just go by intuition, which in this case means changing abch1.txt.

funderburkjim commented 8 months ago

In our particular case, there is already additional pre-processing (the prep directory), and this preprocessor could appropriately make the change. We could reasonably believe this to be part of a data-level change.

So, either

drdhaval2785 commented 8 months ago

Changes of klIba -> klIbahu made at three relevant places in abch1.txt of v5 and v5.1 both in the above commit. If found suitable, kindly close the issue.

funderburkjim commented 8 months ago

Changes seem fine. Closing issue.