Open reynoldsnlp opened 3 years ago
dependency.cg3
and functions.cg3
are required by giella-core
, but they are not present in this repository. To complicate matters further, .gitignore
contains the following:
/src/cg3/dependency.cg3
/src/cg3/functions.cg3
Very strange that files that giella-core
explicitly requires are in .gitignore
.
The idea is that these files are automatically copied from giella-shared, since they tend to be rather language independent. @Trondtr knows more about these files.
This setup goes back to a presentation we had in 2010: Antonsen, L., Wiechetek, L. and T. Trosterud 2010: Reusing Grammatical Resources for New Languages. In Proceedings of the International conference on Language Resources and Evaluation LREC 2010. p. 2782–2789. ISBN 2-9517408-6-7. Stroudsburg: The Association for Computational Linguistics. http://www.lrec-conf.org/proceedings/lrec2010/pdf/254_Paper.pdf where we showed that we were able to add functions and dependencies to North and Lule Sámi, Faroese and Greenlandic, with the same grammars. Thus we have included them. What I now do for e.g. Baltic Finnish languages is that I do not use the common functions.cg3, but I do use the dependency.cg3. For the dependency this is actually quite clear: If you are an object pointing to a transitive verb to your right (@OBJ>), then all that is left for the dependency grammar is to pick the nearest transitive verb as your mother. Assigning the object tag in the first place (functions.cg3) is a bit less language independent.
Another issue is of course that dependency analysis of running text is not a topic for the first year of work on language X anyway. But still, when one gets there one would wish for a nice entrance.
A possibility could be to:
have a script for converting the fst tags to cg3 preamble, i.e. from +N +Nom +Sg +Pl to LIST N = N ; LIST Nom = Nom ; etc.
setup a dummy disambiguator.cg3 file with two rules for case disambiguation, two for number, for person, ...
and a dummy functions.cg3 file with (some more) rules for mapping major functions (SUBJ, OBJ, ADVL, N>, ...)
and a setup for using the script to get all the fst tags installed at the beginning of the cg3 files. One might perhaps even use the INCLUDE command (and include the tags from a generated tag file at runtime).
The dependency file could then be held as-is.
* have a script for converting the fst tags to cg3 preamble, i.e. from +N +Nom +Sg +Pl to LIST N = N ; LIST Nom = Nom ; etc.
As per https://visl.sdu.dk/cg3/chunked/tags.html#list-tags this could be simplified to a single line:
LIST-TAGS += N Nom Sg Pl etc ;
This issue may actually belong to
gut
. I am able to./autogen.sh
/./configure
/make
in giellalt/template-lang-und without errors, but when I tried to do it right off the bat withlang-rue
and other fresh repositories, I get errors. Somehow the new repositories are initialized in a broken state.