ParallelMeaningBank / easyccg

http://homepages.inf.ed.ac.uk/s1049478/easyccg.html
MIT License
4 stars 1 forks source link

boxer printer throws "Unknown rule type: NOISE" #5

Closed kovvalsky closed 5 years ago

kovvalsky commented 5 years ago

The parser throws an error Unknown rule type: NOISE when pasrsing

echo "Colombian authorities say leftist rebels have attacked a military convoy in southern Colombia with explosives , killing 10 servicemen ."  | java -jar ext/easyccg/easyccg.jar --model model_rebank --outputFormat boxer

The reason seems to be the noisy rules found in the model directory in binaryRules file.

Without --outputFormat boxer, the derivation is printed as:

(<T S[dcl] 1 2> (<T NP 0 1> (<T N 1 2> (<L N/N POS POS Colombian N/N>) (<L N POS POS authorities N>) ) ) (<T S[dcl]\NP 0 2> (<L (S[dcl]\NP)/S[dcl] POS POS say (S[dcl]\NP)/S[dcl]>) (<T S[dcl] 1 2> (<T NP 0 1> (<T N 1 2> (<L N/N POS POS leftist N/N>) (<L N POS POS rebels N>) ) ) (<T S[dcl]\NP 0 2> (<T S[dcl]\NP 0 2> (<T S[dcl]\NP 0 2> (<L (S[dcl]\NP)/(S[pt]\NP) POS POS have (S[dcl]\NP)/(S[pt]\NP)>) (<T S[pt]\NP 0 2> (<L (S[pt]\NP)/NP POS POS attacked (S[pt]\NP)/NP>) (<T NP 0 2> (<L NP/N POS POS a NP/N>) (<T N 1 2> (<L N/N POS POS military N/N>) (<T N 0 2> (<L N POS POS convoy N>) (<T N\N 0 2> (<L (N\N)/NP POS POS in (N\N)/NP>) (<T NP 0 1> (<T N 1 2> (<L N/N POS POS southern N/N>) (<L N POS POS Colombia N>) ) ) ) ) ) ) ) ) (<T (S\NP)\(S\NP) 0 2> (<L ((S\NP)\(S\NP))/NP POS POS with ((S\NP)\(S\NP))/NP>) (<T NP 0 2> (<T NP 0 1> (<L N POS POS explosives N>) ) (<T NP\NP 1 2> (<L , POS POS , ,>) (<T S[ng]\NP 0 2> (<L (S[ng]\NP)/NP POS POS killing (S[ng]\NP)/NP>) (<T NP 0 1> (<T N 1 2> (<L N/N POS POS 10 N/N>) (<L N POS POS servicemen N>) ) ) ) ) ) ) ) (<L . POS POS . .>) ) ) ) )

In the output all rules looks valid CCG rules except , S[ng]\NP --> NP\NP

(<T NP\NP 1 2> (<L , POS POS , ,>) (<T S[ng]\NP 0 2>) )

which is indeed mentioned in binaryRules file.

texttheater commented 5 years ago

This makes sense, because AFAIK Boxer does not support the NOISE rule as input. A solution could be to use a model without the NOISE rule (it could just be removed from the binaryRules file).

kovvalsky commented 5 years ago

You are right, from Boxer's perspective these rules doesn't make much sense. One could convert such rules into the forward application with a comma of category (NP\NP)/VP but it is not worth the effort.