funderburkjim / MWderivations

Derivations of headwords in the Monier-Williams (1899) dictionary
1 stars 1 forks source link

Refine derivations from suffix 'Iya' #10

Open funderburkjim opened 8 years ago

funderburkjim commented 8 years ago

Currently, the set of Whitney suffixes used for derivations has only one example where the suffix spelling starts with a vowel, namely Iya (Whitney section 1215).

When applying this suffix to a nominal stem ending in a, that ending a is dropped. For example, applying the Iya suffix to deva would result in devIya. I infer this principle from examples in Whitney.

Previously, the wsfx1 method did not take this into account.

Making this improvement led to an additional 334 derivations.

22100  NTD init
   844 DONE +wsfx1
 86526 DONE cpd1
   925 DONE cpd1a
  4426 DONE cpd3
  1131 DONE cpd4
  1943 DONE cpd5
  3432 DONE cpd_nan
 12548 DONE gender
  1188 DONE inflected
 42771 DONE noparts
  9077 DONE pfx1
  2716 DONE pfx2
  1412 DONE pfxderiv
 15374 DONE srs2
  1029 DONE srs3
  6498 DONE wsfx
  6316 TODO init
funderburkjim commented 8 years ago

I realized that the secondary suffix 'in' had been omitted from the derivations.

It is handled similarly to Iya as discussed above.

An additional 296 derivations were added.

22100  NTD init
  1055 DONE +wsfx1
 86565 DONE cpd1
   928 DONE cpd1a
  4426 DONE cpd3
  1133 DONE cpd4
  1943 DONE cpd5
  3438 DONE cpd_nan
 12548 DONE gender
  1188 DONE inflected
 42771 DONE noparts
  9091 DONE pfx1
  2716 DONE pfx2
  1414 DONE pfxderiv
 15390 DONE srs2
  1031 DONE srs3
  6499 DONE wsfx
  6020 TODO init
funderburkjim commented 8 years ago

Refactored analyze_rec_removesfx.

Some suffixes (like in,vin) in wsfx.txt have the same ending, so for some words there can be more than one possibility to try. Previously, with two possibilities, the logic of removesfx failed. Now, it tries each possible suffix (from shortest suffix to longest), and succeeds on the first one tried. This makes minor changes to the analysis on some case. The 'TODO' number is now 5959.

funderburkjim commented 8 years ago

Added suffix ika to wsfx.txt.

TODO now 5837.

funderburkjim commented 8 years ago

Added suffix ikA to wsfx.txt.

TODO now 5714.

gasyoun commented 3 years ago

it tries each possible suffix (from shortest suffix to longest)

Hmm, would not we want to find a vin instead of in where possible?

We've compiled a list of 305 Sanskrit suffixes.

https://www.dropbox.com/s/sf8opf0mamejmjc/MacDonell-Vedic1916.pdf contains a very well structured data versus not so easy to use Whitney data. I could give into the comparison of suffixes, if it could actually weed out such 200-300 word lists to decrease the TODO list.

gasyoun commented 3 years ago

@funderburkjim seen MacDonell's list?