Open GoogleCodeExporter opened 8 years ago
You need to use flookup for more fine-grained printing. Also, flookup in
conjunction with a diff tool can be used for debugging: you can store some
known pairs in a file manually using flookup's output format, e.g.
rege+Posss2+Plur+Gen+Abl regédekétől
...
and then use this file as a reference file to diff against. For example the
UNIX command:
{{{
cat reference.txt | cut -f1 | flookup -i mygrammar.foma | diff -y reference.txt
-
}}}
would pass all the words in the left column from reference.txt through flookup,
and compare the output to those given in the right column of reference.txt, in
effect giving a listing of all words not generated correctly by the grammar.
Original comment by mans.hul...@gmail.com
on 2 Jan 2012 at 10:15
Thanks for the idea, I can start testing with this.
Not very nice is, that I have to set up programmatically lists like
wd+Gram1+Gram2+Gram3...+Gramn, since no human being can set up manually the
other side (rege-regék-regéim... etc...) for a minumum of 769 cases in a
realistic time, it is also not a job for a human being to do that. Just try
once, and you will feel giddy after the first 10 words... no matter, weather
you do that on your mother-tongue, or not, I can assure you. And if we refer to
an other similar tool, like sfst, as a generator, it is also not that clever.
The more tools, the more possibility for errors. Foma knows everything, why
does not it say us, what it knows?
I can not see any good reason to limit word list output, for example sfst lists
nicely, if the list is endless long, endlessly, and that helps in diagnose
quite a bit.
The present arbitrary limit is not very nice; I had to search around for a long
time to understand, what happens here.
At least a counter argument could be added as limit, for example:
lower-words 100.000, that should cause to list 100.000 words or the maximum
available words, if less than 100.000 are available.
If you make a new version, you could consider this.
Also, the limit and the command behavior should be documented.
Anyway, thanks for your help so far.
Original comment by eleonor...@gmx.net
on 2 Jan 2012 at 7:10
[deleted comment]
I'd like to add one more wish to my wish list: Since it is not easy to match
word form and grammatic form, I always use lists like:
...
rege+Possp3+Genpl+Sup regéjükéin
rege+Possp3+Genpl+Ter regéjükéiig
rege+Possp3+Genpl+Nom regéjükéi
rege+Posss1p+Gen+Abl regéimétől
rege+Posss1p+Gen+Acc regéimét
rege+Posss1p+Gen+Ade regéiménél
rege+Posss1p+Gen+All regéiméhez
...
For diagnostics and corrections.
Therefore it would be very good, if foma had a third command besides
lower-words and upper-words: both-words. Both-words would list both words
(upper and lower) in one list. That would eliminate the need to use any
external tool when setting up lexc/foma tools for new languages or new word
classes in an existing language.
Thank you in advance for considering this in a new version.
Original comment by eleonor...@gmx.net
on 3 Jan 2012 at 9:06
This deficit is especially therefore annoying, because if I use flookup for
checking, I can not see, if undesirable word forms are still there.
Hungarian nouns have as a minimum 769 word forms, verbs 450, adjectives over
1200.
Original comment by eleonor...@gmx.net
on 27 Mar 2012 at 8:54
We are working on a project to create spell checkers for Quechua, Aymara and
Guaraní, which are indigenous languages in Bolivia. We would greatly
appreciate it if an option were added to view all possible combinations with
the "print upper" and "print lower" commands. In Quechua and Aymara, root words
can have up to 14 suffixes and the number of possible combinations of suffixes
is probably more than a thousand. We need to see all the combinations to
eliminate any errors.
Best regards and thanks for all the work on Foma,
Amos Batto
Original comment by amosba...@gmail.com
on 19 Sep 2012 at 11:25
I decided to change the source code to print an unlimited number with "print
upper-words" and "print lower-words".
I changed lines 663 and 979 of iface.c from:
for (i = limit; i > 0; i--) {
To:
while (1) {
After a recompile, Foma printed an unlimited number of the upper and lower
words.
However, I discovered by reading the source code in the file interface.l that
it isn't necessary to change the source code because Foma already has an
undocumented option to specify a different limit for the "print upper-words"
and "print lower-words" commands.
For example, to print up to a thousand upper words, use the command:
foma[1]: print upper-words 1000
The documentation for Foma needs to be changed to inform the user about this
option. To do this, change line 138 in iface.c from:
{"print lower-words","prints words on the lower-side of top FSM",""},
to:
{"print lower-words <limit>","prints words on the lower-side of top FSM","By default the limit is 100"},
There is currently no documentation for the "print upper-words" command, so
also add this line to iface.c in the same array:
{"print upper-words <limit>","prints words on the upper-side of top FSM","By default the limit is 100"},
By the way, the Foma also needs documentation about its comments, so also add a
line like this:
{"#...","comment","All text following # will be ignored"},
Original comment by amosba...@gmail.com
on 19 Sep 2012 at 4:35
Thanks a lot for your valuable input. print upper-words 10000 works fine for
me, and solved the problem of too-few output lines.
Original comment by eleonor...@gmx.net
on 28 Sep 2012 at 2:04
Original issue reported on code.google.com by
eleonor...@gmx.net
on 1 Jan 2012 at 4:21