giellalt / template-lang-und

A template repo for new languages, as well as to update existing language repos with.
https://giellalt.uit.no/
GNU Lesser General Public License v3.0
2 stars 1 forks source link

Simplify example lexc, twolc and cg3 files. #11

Open reynoldsnlp opened 3 years ago

reynoldsnlp commented 3 years ago

When initializing new repositories, it is great to have example files to show the way, but some of the initialized files are surprisingly complex. This makes it so that a linguist starting on a new repository has to de-clutter before she can actually start building.

flammie commented 3 years ago

I made a suggestion based on what I used in teahcing few years ago and the existing stuff for disambiguator.cg3, I don't know if the other cg comes from shared so I didn't dare to touch yet: see here: https://github.com/giellalt/template-lang-und/commit/778e2ac6acca3ee07d2075d2b8c8ddd06fd92b9f

I think lexcies and twolc are still quite simple they seem like what I made originally most of them.., is there anything specific to improve there though?

aarppe commented 3 years ago

I agree with the original suggestion, i.e. paring down the generated LEXC files for stems and affixes to only ones that one might expect for a majority of languages (e.g. nouns, verbs, particles) and which exemplify some key LEXC design principles - to list the ones that come first to mind:

  1. involve inflection and require a split into pos_affixes.lexc and pos_stems.lexc, or
  2. do not have inflection so the word-forms can be enumerated as such 'pos_forms.lexc' (calling those stems could be considered misleading/confusing), or
  3. common/exemplary variants of the above, e.g. numeric symbols that might be inflected.
  4. special cases such as listings of non-standard forms with +Err/Orth tags and their standardized correspondents, as non-standard-forms.lexc

In addition one could explicitly include the LEXC files that are part of shared set up (punctuation, symbols, proper names).

It is somewhat weird to have lots of LEXC files hanging around that one doesn't use and are perhaps excluded by being commented out in root.lexc, e.g. for adjectives and so on.