CQCL / lambeq

A high-level Python library for Quantum Natural Language Processing
https://docs.quantinuum.com/lambeq/
Apache License 2.0
455 stars 111 forks source link

Add Japanese support to DepCCGParser #24

Closed KentaroAOKI closed 2 years ago

KentaroAOKI commented 2 years ago

Updated DepCCGParser to support Japanese. The sample code is as follows.

1. Prepare depccg.

pip install cython numpy depccg
depccg_en download
depccg_ja download

2. Install Japanese fonts on Ubuntu.

apt install -y fonts-migmix
rm ~/.cache/matplotlib/fontlist-v330.json

3. Set the matplotlib Japanese font in the jupyter notebook python code.

import matplotlib
from matplotlib.font_manager import FontProperties

font_path = "/usr/share/fonts/truetype/migmix/migmix-1p-regular.ttf"
font_prop = FontProperties(fname=font_path)
matplotlib.rcParams["font.family"] = font_prop.get_name()

4. Use sentence2diagram in the jupyter notebook python code.

from lambeq import DepCCGParser
from discopy import grammar

parser = DepCCGParser(lang='ja')
diagram = parser.sentence2diagram('これはテストの文です。')
grammar.draw(diagram, figsize=(14,3), fontsize=12)

5. Use ansatz in the jupyter notebook python code.

from lambeq import AtomicType, IQPAnsatz

# Define atomic types
N = AtomicType.NOUN
S = AtomicType.SENTENCE

# Convert string diagram to quantum circuit
ansatz = IQPAnsatz({N: 1, S: 1}, n_layers=2)
discopy_circuit = ansatz(diagram)
discopy_circuit.draw(figsize=(15,10))

6. Use pytket in the jupyter notebook python code.

from pytket.circuit.display import render_circuit_jupyter

tket_circuit = discopy_circuit.to_tk()
render_circuit_jupyter(tket_circuit)
ianyfan commented 2 years ago

Hello, thank you for opening this PR, ashis is a feature we have wanted to add, so it will be very useful if we can get it merged. It looks like the right code paths are present; however, it would be great if we could work out if any heavy workloads can be cached, such as saving the supertagger as an instance variable instead of loading it each time. Very happy to discuss things if you want any help.

KentaroAOKI commented 2 years ago

Hi, I think the same thing. I'll make it on Saturday or Sunday.

KentaroAOKI commented 2 years ago

Hi, DepCCGParser has been fixed. Please review the code.

https://github.com/CQCL/lambeq/pull/24/commits/d3a5b4fac3999cd5c877cd31689c3796b7e61be3

ianyfan commented 2 years ago

@KentaroAOKI I have added some changes to the code, mainly formatting and documentation. Otherwise, the implementation seemed great. I just made a small change in how tokenising is handled, could you have a look and let me know if it works for you? Thanks.

dimkart commented 2 years ago

@KentaroAOKI I have added some changes to the code, mainly formatting and documentation. Otherwise, the implementation seemed great. I just made a small change in how tokenising is handled, could you have a look and let me know if it works for you? Thanks.

@ianyfan Some of the previous edits seem to have been overwritten by your last commit (e.g. lower-casing the language strings), please fix.

ianyfan commented 2 years ago

Thanks!

masakiowari commented 1 year ago

Hello! We are now working on Japanese QNLP by Lambeq following the installation of this page. We found that the present version of depccg_jp and Lambeq cannot treat sentences in which an adjectival verb (Keiyo-Do-Shi) modifies a noun. We give a list of sentences in which Lambeq + depccg_jp cannot create any string diagram. E..g 感動的な映画を見る 曖昧な表現をする 静かな海を見る 健康な男性が歩く 親切な男性がいる 元気な男性が歩く 上品な表現をする きれいな海を見る 健やかな男性が歩く 和やかな雰囲気を感じる 穏やかな笑顔を浮かべる 正直な男性がいる 有名な男性がいる にぎやかな雰囲気を感じる 特別な表現をする 複雑な表現をする まじめな男性がいる 下手な表現をする 便利な本を買う 朗らかな笑顔を浮かべる 幸せな笑顔を浮かべる 好きなスープを食べる 無理な計画を立てる 暇な男性がいる 必要な計画を立てる 邪魔なものをどかす 変な表現をする 自由な表現をする

We would like to know anyone who knows how to solve this problem.

By the way, this problem occurs when we use Lambeq ver. 0.2.6 and 0.3.1. We installed depccg_jp following the above instruction. Except for the sentence including adjectival verbs, depccg_jp + Lambeq works very well.

dimkart commented 1 year ago

@masakiowari Hi, I have a created a top-level issue (#99) for this, in order to give it more visibility. For any follow-up, please use Issue #99.