CQCL / lambeq

A high-level Python library for Quantum Natural Language Processing
https://cqcl.github.io/lambeq/
Apache License 2.0
439 stars 106 forks source link

Adjectival verbs in Japanese lambeq #99

Closed dimkart closed 1 year ago

dimkart commented 1 year ago

Originally posted by @masakiowari in https://github.com/CQCL/lambeq/issues/24#issuecomment-1579955455

Hello! We are now working on Japanese QNLP by Lambeq following the installation in the description of PR #24. We found that the present version of depccg_jp and Lambeq cannot treat sentences in which an adjectival verb (Keiyo-Do-Shi) modifies a noun. We give a list of sentences in which Lambeq + depccg_jp cannot create any string diagram. E..g

感動的な映画を見る 曖昧な表現をする 静かな海を見る 健康な男性が歩く 親切な男性がいる 元気な男性が歩く 上品な表現をする きれいな海を見る 健やかな男性が歩く 和やかな雰囲気を感じる 穏やかな笑顔を浮かべる 正直な男性がいる 有名な男性がいる にぎやかな雰囲気を感じる 特別な表現をする 複雑な表現をする まじめな男性がいる 下手な表現をする 便利な本を買う 朗らかな笑顔を浮かべる 幸せな笑顔を浮かべる 好きなスープを食べる 無理な計画を立てる 暇な男性がいる 必要な計画を立てる 邪魔なものをどかす 変な表現をする 自由な表現をする

We would like to know anyone who knows how to solve this problem.

By the way, this problem occurs when we use Lambeq ver. 0.2.6 and 0.3.1. We installed depccg_jp following the above instruction. Except for the sentence including adjectival verbs, depccg_jp + Lambeq works very well.

dimkart commented 1 year ago

Hi @masakiowari, thanks for this. First thing to check is whether the parser creates a valid CCG tree, you can do this with the following code:

tree = parser.sentence2tree("your sentence") 
if tree is None:
    print("Failure")
else:
    print(tree.deriv())

If the sentence has a CCG derivation, then this is probably a lambeq problem. However, if the sentence fails to parse, then this is a problem of the DepCCG parser. Let us know the result.

masakiowari commented 1 year ago

Hello! Thank you very much. For "the bad" sentence, the result is as follows the code,

import sys
sys.path.append("/lambeq")
sys.path.append("/depccg")
from lambeq import DepCCGParser
from discopy import grammar
tree = DepCCGParser.sentence2tree("上品な 表現 を する")
if tree is None:
print("Failure")
else:
print(tree.deriv())

give an output as

Traceback (most recent call last):
File "c:\Users\bi21008\Downloads\depccg-master\import sys.py", line 16, in
parser = DepCCGParser(verbose='suppress')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\bi21008\Downloads\depccg-master\lambeq\text2diagram\depccg_parser.py", line 155, in init
raise ValueError('DepCCGParser only supports ValueError: DepCCGParser only supports "progress" level of verbosity. suppress was given.`

For the "good" sentence like "これはテストです",
the similar code gives the following output
これ >> Id(n) @ は >> Id(n @ n.r) @ テスト >> Id(n @ n.r) @ です >> Id(n @ n.r @ n.r @ s) @ Cup (n.l, n) >> Id(n @ n.r @ s) @ Cup(n.l, n) >> Cup(n, n.r) @ Id(s)

As you say that this result may suggest the problem is with the parser. However, when we directly use Depccg parser as is written on the following page: https://github.com/masashi-y/depccg the bad sentece like "上品な 表現 を する" get a proper output like ID=2, Prob=-53.02713191278883 {< S[mod=nm,form=base,fin=t] {< S[mod=nm,form=base,fin=f] {< NP[case=nc,mod=nm,fin=f] {NP[case=nc,mod=nm,fin=f] 上品な/上品な/} {NP[case=nc,mod=nm,fin=f]\NP[case=nc,mod=nm,fin=f] 表現/表現/}} {< S[mod=nm,form=base,fin=f]\NP[case=nc,mod=nm,fin=f] {< NP[case=nc,mod=nm,fin=f] {< NP[case=nc,mod=nm,fin=f] {NP[case=nc,mod=nm,fin=f] を/を/} {NP[case=nc,mod=nm,fin=f]\NP[case=nc,mod=nm,fin=f] する/する/}} {S[mod=nm,form=base,fin=t]\S[mod=nm,form=base,fin=f] 。/。/**}}

Here, we may emphasize that the "bad sentence" means the sentence for which depccg + Lambeq do give an output. these bad sentences are grammatically correct in Japanese.

masakiowari commented 1 year ago

By the way, we have modified "ja.py" of depccg to work depccg+Lambeq well. The above error is those we get even after this modification. modification of ja.py is as follows:

for example

def generalized_backward_composition1(x: Category, y: Category) -> Optional[CombinatorResult]: uni = Unification("b\c", "a\b") if uni(x, y): result = x if _is_modifier(y) else uni['a'] | uni['c'] return CombinatorResult( cat=result, op_string="bx", op_symbol="<B1", head_is_left=False, ) return None

is modified as

def generalized_backward_composition1(x: Category, y: Category) -> Optional[CombinatorResult]: uni = Unification("b\c", "a\b") if uni(x, y): result = x if _is_modifier(y) else uni['a'] | uni['c'] return CombinatorResult( cat=result, op_string="bc", op_symbol="<B", head_is_left=False, ) return None

So, we applied the change #op_string bx -> bc, op_symbol <B1 -> <BC, here

Similarly, we applied the following modification on ja.py on "def generalized_backward_composition2"

op_string bx -> gbc, op_symbol <B2 -> <Bⁿ

on "def generalized_backward_composition3"

op_string bx -> gbc, op_symbol <B3 -> <Bⁿ

on "def generalized_backward_composition4"

op_string bx -> gbc, op_symbol <B4 -> <Bⁿ

on "def generalized_forward_composition1"

op_string fx -> gfc, op_symbol >Bx1 -> >Bⁿ

on "def generalized_forward_composition2"

op_string fx -> gfc, op_symbol >Bx2 -> >Bⁿ

on "def generalized_forward_composition3"

op_string fx -> gfc, op_symbol >Bx3 -> >Bⁿ

That is all.

Before, this modification much more error occurred.
The problem on "Adjectival verbs" are that remain even after this modification.

ianyfan commented 1 year ago

@masakiowari Hi, I can't seem to replicate this issue. Could you run this code fragment on your system and show us the full output please?

from lambeq import DepCCGParser
parser = DepCCGParser(lang='ja')

sentences = [
    '感動的な映画を見る',
    '曖昧な表現をする',
    '静かな海を見る',
    '健康な男性が歩く',
    '親切な男性がいる',
    '元気な男性が歩く',
    '上品な表現をする',
    'きれいな海を見る',
    '健やかな男性が歩く',
    '和やかな雰囲気を感じる',
    '穏やかな笑顔を浮かべる',
    '正直な男性がいる',
    '有名な男性がいる',
    'にぎやかな雰囲気を感じる',
    '特別な表現をする',
    '複雑な表現をする',
    'まじめな男性がいる',
    '下手な表現をする',
    '便利な本を買う',
    '朗らかな笑顔を浮かべる',
    '幸せな笑顔を浮かべる',
    '好きなスープを食べる',
    '無理な計画を立てる',
    '暇な男性がいる',
    '必要な計画を立てる',
    '邪魔なものをどかす',
    '変な表現をする',
    '自由な表現をする'
]

for sentence in sentences:
    print(parser.sentence2tree(sentence))

Thank you.

masakiowari commented 1 year ago

@ianyfan Thank you very much for your great suggestion! now, we have reconstructed our environment of lambeq and depccg without using the modified ja.py file. Now, we can work on the sentences like '感動的な映画を見る', '曖昧な表現をする', '静かな海を見る', without errors.

2023-06-09 (1)

2023-06-09 (2)

2023-06-09 (3)

Now, we can treat Adjectival verbs.

Unfortunately, however, it seems that there still exit sentences which cannot be treated. e.g. "ボブはおいしくないカレーが嫌いではない"

2023-06-09 (4)

2023-06-09 (5)

2023-06-09 (6) ent with the modified ja.py file.

This sentence can be treated in the old environm

ianyfan commented 1 year ago

Hi, I've had a look and the issue seems to due to depccg returning a parse that cannot be drawn under standard CCG rules.

From your initial list of sentences, there are 4 that lambeq cannot draw:

They all have the same issue. For example, for the first sentence, depccg returns a parse that contains this problematic sub-parse:

 親切   な
----- -----
  S    S\S       男性
-----------(BA) -----
     S            N
---------------------(UNK)
           N

depccg tells us which rule it uses at each step, e.g. BA for backwards application. For the bottom rule, depccg provides the rule "other" which clearly isn't a standard CCG rule. Therefore, we cannot draw this tree as a diagram.

The example in your comment "ボブはおいしくないカレーが嫌いではない" has a different issue, where depccg tries to perform backwards cross composition (BX) on the types S\N + S\S -> S\N which are not valid types to perform backwards cross composition on, which results in an error when trying to draw the diagram.

So I'm afraid I'm not sure if we can help you on the lambeq side; this seems to be an issue with how depccg parses these sentences.

I hope that helps. Let me know if you have any more questions.

masakiowari commented 1 year ago

@ianyfan , Thank you very much for your detailed explanation. Now, I perfectly understand the reason for this problem.
So, Lambeq can only understand the standard ccg, and depccg-ja sometimes outputs something which does not obey the standard ccg. Now, the possible solution may be to modify depccg such that it only output standard ccg. I will try to solve the problem along this direction.

dimkart commented 1 year ago

We'll convert this to a Discussion since it might be useful for other users as well.