Error with Japanese Sentence Parsing Using DepCCG in Lambeq

sora9suzuki commented 1 year ago

Hello,

I apologize in advance for any mistakes in my English, as it is not my native language.

I am using lambeq with DepCCG parser to process Japanese sentences and I encountered an error similar to what was reported in issue #99.

While attempting to resolve this, I noticed a discrepancy in the rule symbols between ccg_rule.py in lambeq and ja.py in DepCCG. For instance, the backward_composition rule in ccg_rule.py is denoted as BC or <B, whereas in ja.py it's represented as bx or <b1. To address this, I adjusted ja.py to match the notation in ccg_rule.py. As a result, I was able to process the sentence "ボブはおいしくないカレーが嫌いではない" (Bob does not dislike curry that is not delicious).

However, when I tried to process another sentence, "親切な男性がいる" (There is a kind man), I encountered a CCGRuleUseError with the message 'unknown CCG rule'.

Upon investigating this issue further by inputting the sentence directly into DepCCG, I believe the problem lies with the ADNint rule, which is unique to Japanese and maps S\NP to NP/NP.

I would like to know if it's possible to add this rule to lambeq, and if so, where should I add it? Any guidance on this would be greatly appreciated.

Thank you in advance for your help.

ACE07-Sev commented 1 year ago

Greetings there mate,

Hope you are well. I'm not an admin, just a user of the package like yourself. I think you should refer to the UML diagram for the text2diagram. I assume what you're trying to fix is parsing Japanese sentences into diagrams. I think that would be helpful for starters.

You already seem to have a good understanding of the grammar of it in terms of Lambek's formalism, so you just need to code it out and test it. I'd love to help you, but I don't have any familiarity with Japanese grammar hehe. Codewise, if you need any help let me know.

I'm terribly sorry if this wasn't helpful, just hoping to help a bit whilst the admins are busy with other work.

sora9suzuki commented 1 year ago

Hello,

Thank you for your response. I appreciate your suggestion to revisit the UML diagram for the text2diagram. I will look into it and see if I can find a solution to my problem. Your help is much appreciated, even though you are not familiar with Japanese grammar. I will reach out if I need any further assistance with the code.

Best regards.

dimkart commented 1 year ago

@sora9suzuki Hi, I noticed you have closed the issue, did you find a solution to your problem? The rule you mention above looks like what in english is addressed as a "unary rule"; this means that perhaps there is a way to fix this problem. We'll try to have a better look soon.

sora9suzuki commented 1 year ago

Hi,

Thank you for your attention on this issue. Yes, I've managed to find a solution. Upon examining the output from depccg, I noticed it produced S/NP NP -> NP. As a result, I decided to rewrite S as NP/NP, effectively making it similar to NP/NP NP -> NP.

Admittedly, it's a bit of a forced workaround. I've added a new rule to CCGRule:


ADNINT = 'OTHER' ,'SSEQ'

In rule identification, I've inserted the following:


elif self == CCGRule.ADNINT:
    return Diagram.fa(cod, cod)

Though a bit brute force, it enables the processing of sentences like "親切な男性がいる". Thanks again for your assistance!

CQCL / lambeq

Error with Japanese Sentence Parsing Using DepCCG in Lambeq #100