This project implements a subset of the syntax of Attempto Controlled English (ACE) version 6.7 in Grammatical Framework (GF) and ports it to ~20 natural languages (see the Makefile for the currently supported languages). Note that this project does not implement the mapping of ACE sentences to discourse representation structures.
The grammar should cover a reasonably large and interesting subset of ACE, specifically the OWL-compatible subset that is supported by AceWiki. Ideally, the grammar should cover the full ACE.
The grammar should not overgenerate, i.e. it should be possible to use the grammar in a look-ahead editor (which explicitly exposes the coverage of the grammar to the user).
The grammar should allow for bidirectional translations between ACE and a number of other controlled natural languages, i.e. ACE-like German, Italian, Finnish, etc.
Tested with:
The ACE-in-GF grammar can be turned into a PGF file in various ways depending on
Several targets are available in the Makefile, which start with the
pgf_
prefix. To build the grammar execute e.g.
make pgf_acewiki_aceowl # AceWiki grammar, ACE only, tiny test vocabulary
make pgf_ontograph_40 # AceWiki grammar, all languages, small test vocabulary
To build the full ACE grammar (ACE-only) with a lexicon of ~1000 general words, run
bash make-pgf.bash
(Note that it is important that you use bash
.)
The building should not take more than a couple of minutes. The GF libraries are expected to be found in a system-wide location, e.g.:
Example of translating an ACE sentence to other languages.
$ make pgf_Words300
$ gf Words300.pgf
Words300> p -lang=Ace -cat=Text "it is false that X is read by nothing but computers that Y does not see ." | l -treebank
Words300: sText (falseS ... see_V2))))))
Words300Ace: it is false that X is read by nothing but computers that Y doesn't see .
Words300Cat: és fals que X està llegit per nomÈs ordinadors que Y no veu .
Words300Dan: det er falsk at X bliver læst af kun datamaskiner som Y ikke ser .
Words300Dut: 't is onwaar dat X door slechts computers die Y niet ziet gelezen wordt .
Words300Fin: on epätotta että X luetaan vain tietokoneiden joita Y ei näe toimesta .
Words300Fre: il est faux que X est lu par seulement des ordinateurs qu' Y ne voit pas .
Words300Ger: es ist falsch dass X durch nur Rechner die Y nicht sieht gelesen wird .
Words300Ita: è falso che X viene letto da soltanto computer che Y non vede .
Words300Lav: nav tiesa , ka X lasa tikai datori , ko Y neredz .
Words300Nor: det er falsk at X blir lest av kun datamaskiner som Y ikke ser .
Words300Pol: jest źle , że X przez tylko komputery , których nie widzi Y jest czytane .
Words300Ron: este fals cã X este citit de către doar nişte calculatoare pe care nu le vede Y .
Words300Rus: неверно , что X читается с помощью единственных компьютеров , которые Y не видит .
Words300Spa: es falso que X está leído por solamente computadoras que Y no ve .
Words300Swe: det är falskt att X blir läst av bara datorer som Y inte ser .
Some tools need to be installed and available on the PATH:
The commands
ghc --make -o Parser Parser.hs
bash make-pgf.bash grammars/acewiki_aceowl/ "words/acewiki_aceowl/TestAttempto{Ace,}.gf"
bash run-test.bash tests/acewiki_aceowl/sentences.txt
do the following:
The test script creates two output files
test_out.txt
all the sentences classified as OK or FAIL, with ambiguity shown in case of OK, and successfully parsed prefix in case of FAILtest_out_fail.txt
: frequency ranking of failed sentencesTo run a test with the full ACE grammar and the 1000-word vocabulary on all the test cases in the tests-directory, execute:
bash make-pgf.bash
bash run-all-tests.bash > tests/run-all-tests.out 2> tests/run-all-tests.err
The output files are created into the subdirectories of the tests-directory.
Additional test-targets are provided by the Makefile.
Use the following Makefile targets to generate and store the linearizations.
lin_ontograph_40_save
lin_ontograph_ext_save
lin_Geography_save
lin_Words300_save
lin_acewiki_aceowl_save
All the linearizations except for acewiki_aceowl
are also under version control.
See the Makefile targets, that have the prefix test_precision
.
echo "gr -number=10" | gf --run TestAttempto.pgf | Roundtripper -f TestAttempto.pgf -l TestAttemptoAce | grep DIFF
echo "rf -lines -file=tests/ontograph_ext/sentences.txt | p -lang=Ace -cat=ACEText" |\
gf --run TestAttempto.pgf | Roundtripper -f TestAttempto.pgf -l TestAttemptoAce | grep DIFF
Currently this results in (TODO: fix these):
Tom buys a picture -> Tom buys the picture (article changes, in Fin)
Mary sees no man -> Mary doesn't see no man (negation added, several languages)
if X sees somebody who sees Y ... -> if X sees somebody who Y sees ... (word order changes, in Ger and Dut)
what does Tom buy ? -> what buys Tom ? (word order changes, all langs but ACE and Eng)
coverage_acewiki_aceowl_save
coverage_ontograph_ext_save
Changes to the ACE-in-GF grammar can be done in 3 directories.
ACE resource grammar. Based on the English resource grammar. Describes
deviations from the English grammar, e.g. ACE uses who
instead of whom
.
Implementation of different ACE subsets and their ports to other languages.
Contains the common interface AttemptoI.gf
.
Different domain vocabularies, most of which are automatically generated from existing external terminologies.
To add a new language that is supported by the GF Resource Grammar Library (RGL)
add these files into the grammars
-directory (Xyz
is the ISO code of the new language):
grammars/acewiki_aceowl/AttemptoXyz.gf
grammars/acewiki_aceowl/LexAttemptoXyz.gf
grammars/acewiki_aceowl/NumeralXyz.gf
(this is only needed if you get a "cannot unify the information" error, in this case copy this file from the RGL and change the first line to contain CatXyz [Numeral,Digits]
)Then add various lexicon files to be able to run the tests:
words/Words300/Words300Xyz.gf
words/acewiki_aceowl/TestAttemptoXyz.gf
words/ontograph_40/TestAttemptoXyz.gf
Finally, register the new language in the Makefile
and run the linearization tests.
We measure the status of this project in various ways, see more: