giellalt / template-lang-und

A template repo for new languages, as well as to update existing language repos with.
https://giellalt.uit.no/
GNU Lesser General Public License v3.0
2 stars 1 forks source link

Include necessary files in template #10

Open reynoldsnlp opened 3 years ago

reynoldsnlp commented 3 years ago

This issue may actually belong to gut. I am able to ./autogen.sh/./configure/make in giellalt/template-lang-und without errors, but when I tried to do it right off the bat with lang-rue and other fresh repositories, I get errors. Somehow the new repositories are initialized in a broken state.

make[2]: Leaving directory '/Users/robertreynolds/gt/lang-rue/src/orthography'
Making all in cg3
make[2]: Entering directory '/Users/robertreynolds/gt/lang-rue/src/cg3'
make[2]: *** No rule to make target 'dependency.cg3', needed by 'dependency.bin'.  Stop.
make[2]: Leaving directory '/Users/robertreynolds/gt/lang-rue/src/cg3'
make[1]: *** [Makefile:1187: all-recursive] Error 1
make[1]: Leaving directory '/Users/robertreynolds/gt/lang-rue/src'
make: *** [Makefile:538: all-recursive] Error 1
reynoldsnlp commented 3 years ago

dependency.cg3 and functions.cg3 are required by giella-core, but they are not present in this repository. To complicate matters further, .gitignore contains the following:

/src/cg3/dependency.cg3
/src/cg3/functions.cg3

Very strange that files that giella-core explicitly requires are in .gitignore.

snomos commented 3 years ago

The idea is that these files are automatically copied from giella-shared, since they tend to be rather language independent. @Trondtr knows more about these files.

Trondtr commented 3 years ago

This setup goes back to a presentation we had in 2010: Antonsen, L., Wiechetek, L. and T. Trosterud 2010: Reusing Grammatical Resources for New Languages. In Proceedings of the International conference on Language Resources and Evaluation LREC 2010. p. 2782–2789. ISBN 2-9517408-6-7. Stroudsburg: The Association for Computational Linguistics. http://www.lrec-conf.org/proceedings/lrec2010/pdf/254_Paper.pdf where we showed that we were able to add functions and dependencies to North and Lule Sámi, Faroese and Greenlandic, with the same grammars. Thus we have included them. What I now do for e.g. Baltic Finnish languages is that I do not use the common functions.cg3, but I do use the dependency.cg3. For the dependency this is actually quite clear: If you are an object pointing to a transitive verb to your right (@OBJ>), then all that is left for the dependency grammar is to pick the nearest transitive verb as your mother. Assigning the object tag in the first place (functions.cg3) is a bit less language independent.

Another issue is of course that dependency analysis of running text is not a topic for the first year of work on language X anyway. But still, when one gets there one would wish for a nice entrance.

A possibility could be to:

TinoDidriksen commented 3 years ago
* have a script for converting the fst tags to cg3 preamble, i.e. from
  +N +Nom +Sg +Pl
  to
  LIST N = N ;
  LIST Nom = Nom ;
  etc.

As per https://visl.sdu.dk/cg3/chunked/tags.html#list-tags this could be simplified to a single line: LIST-TAGS += N Nom Sg Pl etc ;