Simplify example lexc, twolc and cg3 files.

reynoldsnlp commented 3 years ago

When initializing new repositories, it is great to have example files to show the way, but some of the initialized files are surprisingly complex. This makes it so that a linguist starting on a new repository has to de-clutter before she can actually start building.

Some of my students have been confused about what they are allowed to delete before they get started. It would be helpful to have a comment at the very top of these template files to indicate that virtually everything can be deleted/replaced.
I think it would be nice to simplify all of the example lexc, twolc, and cg3 files down to trivial toy examples, so that new projects only have to change/delete a few lines to get started with a new language. Anything that's important to include for explanatory purposes could be in comments, so it has no effect on how the project builds.

flammie commented 3 years ago

I made a suggestion based on what I used in teahcing few years ago and the existing stuff for disambiguator.cg3, I don't know if the other cg comes from shared so I didn't dare to touch yet: see here: https://github.com/giellalt/template-lang-und/commit/778e2ac6acca3ee07d2075d2b8c8ddd06fd92b9f

I think lexcies and twolc are still quite simple they seem like what I made originally most of them.., is there anything specific to improve there though?

aarppe commented 3 years ago

I agree with the original suggestion, i.e. paring down the generated LEXC files for stems and affixes to only ones that one might expect for a majority of languages (e.g. nouns, verbs, particles) and which exemplify some key LEXC design principles - to list the ones that come first to mind:

involve inflection and require a split into pos_affixes.lexc and pos_stems.lexc, or
do not have inflection so the word-forms can be enumerated as such 'pos_forms.lexc' (calling those stems could be considered misleading/confusing), or
common/exemplary variants of the above, e.g. numeric symbols that might be inflected.
special cases such as listings of non-standard forms with +Err/Orth tags and their standardized correspondents, as non-standard-forms.lexc

In addition one could explicitly include the LEXC files that are part of shared set up (punctuation, symbols, proper names).

It is somewhat weird to have lots of LEXC files hanging around that one doesn't use and are perhaps excluded by being commented out in root.lexc, e.g. for adjectives and so on.

giellalt / template-lang-und

Simplify example lexc, twolc and cg3 files. #11