CQCL / lambeq

A high-level Python library for Quantum Natural Language Processing
https://cqcl.github.io/lambeq/
Apache License 2.0
440 stars 106 forks source link

IQPAnsatz: shape error as changing number of qubits for atomic types #123

Closed shinyoung3 closed 8 months ago

shinyoung3 commented 9 months ago

Dear Lambeq team,

Hellom, I'm new to qnlp and am currently reproducing the MC dataset from this paper. I'm having difficulty understanding the data structure and the way Quantum Trainer handles the dataset. I want to reproduce Figure 14, which varies the number of qubits assigned for noun wires and layers using IQPAnsatz shown below.

image

Shape error arose from the training process when I changed these parameters from the default setting (ansatz = IQPAnsatz({AtomicType.NOUN: 1, AtomicType.SENTENCE: 1}, n_layers=1, n_single_qubit_params=3)).

For example, if I adjust the number of qubits for noun, ansatz = IQPAnsatz({AtomicType.NOUN: 2, AtomicType.SENTENCE: 1}, n_layers=1, n_single_qubit_params=3)) to reproduce (1,2,1), the error looks like:

ValueError: could not broadcast input array from shape (2,2) into shape (2,)

I'm not sure how I should modify the loss function and accuracy to correctly reflect the change of the parameters. The current setup for the loss function and accuracy are:

EPSILON = 1e-9 loss = lambda y_hat, y: -np.sum(y * np.log(y_hat + EPSILON)) / len(y)
accuracy = lambda y_hat, y: np.sum(np.round(y_hat) == y) / len(y) / 2

Thank you!

dimkart commented 9 months ago

Hi and sorry for the late reply. The problem might be that not all sentences in the dataset are parsed with type s as the root category; some of them might be parsed as n. Unfortunately, some times statistical models do mistakes. In lambeq, you can use the UnifyCodomainRewriter to be sure that all sentences are parsed with s at the root. Here is an example:

from  lambeq import UnifyCodomainRewriter, AtomicType

rewriter = UnifyCodomainRewriter(AtomicType.SENTENCE)  #  Codomain should be always S

new_diagrams = [rewriter(d) for d in original_diagrams]

Hope this helps.

shinyoung3 commented 9 months ago

Hi, thank you for the reply! Redefining diagrams with UnifyCodomainRewriter actually worked well with the sentence-type dataset (e.g. she likes ice cream). But, is there any way that I can perform the same experiment (varying ansatz parameters) for a noun-type dataset (e.g. telescope that observes objects)? I have a noun-type dataset with an ansatz setting:

ansatz = IQPAnsatz({AtomicType.NOUN: 3, AtomicType.SENTENCE: 0}, n_layers=1, n_single_qubit_params=3)

The value error appears after rewriter = UnifyCodomainRewriter(AtomicType.NOUN) as shown below.

ValueError: Provided arrays must be of equal shape. Got arrays of shape (30, 2, 2, 2) and (30, 2, 2).

dimkart commented 9 months ago

The problem is you haven't prepared your targets right; since you use 3 qubits for the output wire (and 30 is your batch size), your target must be of shape (30, 2, 2, 2), that is, for each data instance $2^3 = 8$ possible outcomes.

shinyoung3 commented 8 months ago

Hi, sorry for the late reply. I couldn't fix the problem yet but I'll try to refine my dataset and apply a different approach. I appreciate your help!