CQCL / lambeq

A high-level Python library for Quantum Natural Language Processing
https://cqcl.github.io/lambeq-docs
Apache License 2.0
451 stars 108 forks source link

BobCat fails to parse with extra addition of "the" to a sentence. #121

Closed abuzomol closed 8 months ago

abuzomol commented 1 year ago

I have been trying run this code:

import warnings
warnings.filterwarnings("ignore")
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

from discopy.grammar.pregroup import Spider, Ty, Id, Box, Diagram, Word
from lambeq import DepCCGParser, pregroups

from lambeq import TreeReader, TreeReaderMode

reader = TreeReader(mode=TreeReaderMode.RULE_ONLY)

sent = "Avoiding processed and sugary foods is important for reducing the risk of chronic diseases such as diabetes , obesity , and heart disease ."

tree_diagram = reader.sentence2diagram(sent,suppress_exceptions=False)
print(tree_diagram)

which gives the error: Illegal use of UNK: unknown CCG rule.

However, if we run the same program with the following sentence after removing "the", it runs succesffuly.

...
sent = "Avoiding processed and sugary foods is important for reducing risk of chronic diseases such as diabetes , obesity , and heart disease ."
...

My attempt to debug the issue has reached a point where the conjunction in "and sugary food" was causing the error by having mismtach CCGRule type in "call" function in CCGRule class. Not sure why this is happening and how to solve it. Any ideas?

Thanks!

dimkart commented 12 months ago

Hi @abuzomol and thanks for spotting this. There is indeed an issue in Bobcat, which seems it's missing a translation for a special case of conjunction. We will look at this and fix it soon.

dimkart commented 8 months ago

This is now fixed in version 0.4. The issue will be closed.