Closed odanoburu closed 6 years ago
@aarneranta could you help us here? We are trying hard to debug the code to understand what is blocking the compilation of the Portuguese grammar. Even with verbosity set to 3, we don't have any clue about the cause of the problem. Any tips?
$ gf -v=3 --batch portuguese/LangPor.gf
...
generating PMCFG
+ AdvS 1 (1,1)
+ AdvSlash 8 (8,1)
+ EmbedQS 1 (1,1)
+ EmbedS 1 (1,1)
+ EmbedVP 16 (2,2)
+ ExtAdvS 1 (1,1)
+ ImpVP 16 (4,4)
+ PredSCVP 16 (8,8)
+ PredVP 1536 (6264,6264)
+ RelS 4 (1,1)
+ SSubjS 2 (2,2)
+ SlashPrep 8 (4,1)
+ SlashVP 12288
@odanoburu @arademaker It just seems that SentenceRomance is slow, see all the comments here: https://github.com/GrammaticalFramework/GF/blob/master/lib/src/romance/SentenceRomance.gf#L18-L140
(The number SlashVP 12288 means that the single category SlashVP is expanded into 12288 concrete categories, one for each combination of parameters that it can possibly get: anything like the number, gender and definiteness of its arguments; is it a pronoun or not; object case, such stuff. Anything that is a parameter in some category, that is even remotely used by SlashVP.)
I suggest you just comment out SlashVP and SlashVS from SentenceRomance and compile your Portuguese grammar, to be able to continue the development. For instance, the grammar used here http://cloud.grammaticalframework.org/wc.html does not even use SlashVP and SlashVS: https://github.com/GrammaticalFramework/GF/blob/master/examples/app/App.gf#L4-L24 see here all functions that are excluded.
When I commented out SlashVP and SlashVS from SentenceRomance, I get this error message:
- parsing IdiomPor.gf
renaming IdiomPor.gf:30-39:
Happened in the renaming of ProgrVP
constant not found: estar_2
given P, ParamX, Prelude, BeschPor, ParadigmsPor, MorphoPor,
CommonX, CatPor, IdiomPor
IdiomPor.gf:21-23:
Happened in the renaming of ExistNP
constant not found: hay_3
given P, ParamX, Prelude, BeschPor, ParadigmsPor, MorphoPor,
CommonX, CatPor, IdiomPor
IdiomPor.gf:24-28:
Happened in the renaming of ExistIP
constant not found: hay_3
given P, ParamX, Prelude, BeschPor, ParadigmsPor, MorphoPor,
CommonX, CatPor, IdiomPor
Thanks for the work so far, and sorry for reacting slowly to the pull requests!
Two more things:
I don't actually know what is the reason, but if I try to compile any of the RGL languages that uses the Romance functor straight from the lib/src/cabal install
from the root directory, it compiles all the languages in this list and puts the .gfos into $GF_LIB_PATH
. The resulting LangSpa/Fre/Ita has the Slash* functions and they work fine. I don't know what magic causes this.
Here's what I've done:
cabal install
At this point, I'm getting a whole lot of errors from IrregPor, such as following:
lib/src/portuguese/IrregPor.gf:
lib/src/portuguese/IrregPor.gf:63372-63442:
Happened in linearization of sentar_V
wrong number of values in table table
VFB
["sentar"; "sentando"; "sentado"; "siento"; "sientas";
"sienta"; "sentamos"; "sentáis"; "sientan"; "siente";
"sientes"; "siente"; "sentemos"; "sentéis"; "sienten";
"sentaba"; "sentabas"; "sentaba"; "sentábamos";
"sentabais"; "sentaban"; "sentara"; "sentaras"; "sentara";
"sentáramos"; "sentarais"; "sentaran"; "sentase";
"sentases"; "sentase"; "sentásemos"; "sentaseis";
"sentasen"; "senté"; "sentaste"; "sentó"; "sentamos";
"sentasteis"; "sentaron"; "sentaré"; "sentarás"; "sentará";
"sentaremos"; "sentaréis"; "sentarán"; "sentare";
"sentares"; "sentare"; "sentáremos"; "sentareis";
"sentaren"; "sentaría"; "sentarías"; "sentaría";
"sentaríamos"; "sentaríais"; "sentarían"; variants {};
"sienta"; "siente"; "sentemos"; "sentad"; "sienten";
"sentado"; "sentada"; "sentados"; "sentadas"]
If I use the old files in portuguese
, it compiles, but all sentences it generates seems to be just Spanish.
So if you add Portuguese to both lists (languages and incomplete languages) in Setup.hs, you should be able to compile it yourselves!
The files CombinatorsPor.gf, ConstructorsPor.gf, SymbolicPor.gf, SyntaxPor.gf and TryPor.gf should be in the directory api
, not portuguese
. If you put them in the right place, then you don't need to have Portuguese in the list of incomplete languages.
When I commented out SlashVP and SlashVS from SentenceRomance, I get this error message:
At this point, I'm getting a whole lot of errors from IrregPor, such as following:
these were corrected on my repo, I'm now including them on my fork of this repo!
2) Files in wrong place
I'm correcting this, thanks!
1) Compile without commenting out Slash* from SentenceRomance
it now works!! :smile: thank you very much @inariksit
I'll update the PR now.
hello @inariksit , are you able to import all Portuguese tenses? I can compile GF, and I can import and use the Portuguese present tense, but not all tenses...
@odanoburu It's really slow to link the Portuguese grammar--I just stopped it after 5 minutes. I can try overnight or some other time I don't have to do something else. But here's another hack, if you only want to test the linearisations, not parsing.
1) Import the grammar with the flag -retain
> i -retain LangPor.gfo
157 msec
2) Test any tree you like with cc
(compute_concrete). You can see all options for cc
if you type help cc
into the GF shell.
> cc -table -unqual PredVP (UsePron i_Pron) (ComplSlash (SlashV2a drink_V2) (MassNP (UseN beer_N)))
s . DDir => RPres => Simul => RPos => Indic => eu bebo cerveja
s . DDir => RPres => Simul => RPos => Conjunct => eu beba cerveja
s . DDir => RPres => Simul => RNeg False => Indic => eu no bebo cerveja
…
s . DDir => RPast => Simul => RNeg True => Indic => eu no bebia cerveja
s . DDir => RPast => Simul => RNeg True => Conjunct => eu no bebesse cerveja
s . DDir => RPast => Anter => RPos => Indic => eu havia bebido cerveja
s . DDir => RPast => Anter => RPos => Conjunct => eu houvesse bebido cerveja
…
s . DInv => RCond => Anter => RNeg True => Indic => no haveria bebido cerveja eu
s . DInv => RCond => Anter => RNeg True => Conjunct => no haveria bebido cerveja eu
I see all tenses are formed in the output.
Some of the parameters are redundant, like the Boolean in RNeg, but I see that comes from the Romance functor. I can import Spanish grammar in 2 minutes or so, you could have a look if they have excluded something from the Romance to make it faster. Or have you added any parameters to the Portuguese, that could explain why it's slower?
Hi @inariksit , I left it running overnight. It consumed 48G of RAM and it didn’t finished. Something is wrong ...
@odanoburu Yeah same for me, it didn't finish overnight. I'm not all that surprised that someone has managed to write a GF grammar that doesn't finish compiling or linking (or whatever it is that makes it parse in addition to linearisation) on my computer :-P but the fact that Spanish does finish, makes it strange.
Does it work when you comment out SlashVP and SlashVS in the Romance functor?
@inariksit as commented by IRC, commenting SlashVP and SlashVS creates several errors that would require more commenting...
@odanoburu I commented out Slash* and everything else that needed commenting out, here are the PGFs: http://old-darcs.grammaticalframework.org/~inari/portuguese/ Both of them are the same grammar, I just compiled the second with the flag --optimize-pgf; I'm including the first one just for curiosity. (The grammar testing tool runs the smaller one much faster too!)
hey @inariksit , thanks!! do these work for all tenses then??
I had no idea the optimized PGF could be this smaller, I guess I must read the PGF paper..
can you push the commented romance to a branch on your fork, please?
@odanoburu Sorry about the late answer (I've turned off notifications on pretty much everything; if you ever want a quick answer from me, come to IRC! :-D) The commented out grammar works for all tenses. But even better, we've got a proper solution to your problem now! \o/ Aarne and I had a look at the grammar, and turns out it was only about the variants in BeschPor.gf. That's a good cautionary tale to not use variants in the resource grammar :-P (And also a reminder that we should really do something about the handling of the variants, so it doesn't blow up.)
Aarne shared another hack how to get the same behaviour as in variants: using the pre
construction with an empty string, and a wildcard otherwise branch, it always creates the first one, but parses also the second one. It's all in this commit, which I pushed to the master repo. If you want to change the order of the variants, I suggest just flip x
and y
in the function vars
.
@inariksit oh, I'm so glad you've found a solution! I didn't know that the variants were not well supported... but they do work, just not on big grammars...
thank you very much!
@odanoburu Yeah, variants work, but they just cause an explosion of possibilities in the tables, which in this case leads into total freeze. The hack with pre
ensures that the variants stay inside the tables.
Just as a curiosity, this is how pre
works even for English: it parses even the wrong forms (an car, a animal), but only linearises the correct forms.
Languages: LangEng
Lang> p "an car" | l
a car
Lang> p "a animal" | l
an animal
@inariksit I see! are there are other restrictions on other constructions like variants? (nonExist
, for instance?)
it parses even the wrong forms (an car, a animal), but only linearises the correct forms.
that's nice for this use-case, but in the case of BeschPor
we'd like to parse Brazilian Portuguese and European Portuguese forms, and also linearize them (although there wouldn't be a clear way of selecting them, so I guess it's not that big of a loss!)
@odanoburu In that case, it definitely makes sense to have two different functions for each form, or two different files. You could have two folders, brazilian
and european
in the GF/lib/src/portuguese
folder, and in each of them put a BeschPor.gf
which are otherwise identical but for the verb forms. Then in your LangPor.gf
, put either brazilian
or european
in the path, e.g. --# -path=.:../romance:../abstract:../common:../api:brazilian
, and change of brazilian to european changes the standard.
@inariksit that's a nice idea for the verbs and the lexicon! (which is what I intended to implemented anyway, because I don't know well enough the other differences)
this is a work in progress, but already has some useful stuff.
LangPor.gf
on any of the machines I have available, so I'd love so tips regarding this!LexiconPor.gf
, though. the nouns there have been revised, but everything else remains to be tested.the main repo I'm working on is this one, which has the project's actual history.