HKUST-KnowComp / GEIA

Code for Findings-ACL 2023 paper: Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recover the Whole Sentence
MIT License
36 stars 12 forks source link

An question about the code, thank you! #2

Closed Kissacat closed 8 months ago

Kissacat commented 10 months ago

Hello! THANK YOU for your work! I do appreciate your contribution

When I read your code in attacker.py, I am confused as to why you concatenate the victim embeddings and attacking embedding in the training phase but only compute the attacking embedding of attacker models in the testing phase. Whether there will be a dimensional mismatch? I'd appreciate it if you could respond to my questions in your busy schedule.

In the training phase (line 236-240 in attacker.py): batch_X = batch_X.to(device) batch_X_unsqueeze = torch.unsqueeze(batch_X, 1) inputs_embeds = torch.cat((batch_X_unsqueeze,input_emb),dim=1) #[batch,max_length+1,emb_dim (1024)]

In the test phase (line 165 in attacker.py): sent_list, gt_list = eval_on_batch(batch_X=embeddings,batch_D=batch_text,model=config['model'],tokenizer=config['tokenizer'],device=device,config=config)

Thank you!!

teapotliid commented 10 months ago

Hi Kissacat, Sorry for the late reply. For mismatched dimensions, we apply a 1-layer neural network (Class: linear_projection) to ensure that the embedding dimensions can be matched.

Kissacat commented 10 months ago

Hi Kissacat, Sorry for the late reply. For mismatched dimensions, we apply a 1-layer neural network (Class: linear_projection) to ensure that the embedding dimensions can be matched.

Thank you!