About target feature representations from the decoder.

Shark-NLP / CoNT

[NeurIPS'22 Spotlight] Data and code for our paper CoNT: Contrastive Neural Text Generation

https://arxiv.org/pdf/2205.14690.pdf

150 stars 15 forks source link

About target feature representations from the decoder. #1

Closed DenglinGo closed 1 year ago

DenglinGo commented 2 years ago

As described in the article 2205.14690: "The feature representations come from pooling the output of the encoder (source sequence) or decoder (target sequence) ." However, the Transformer decoders contain cross attention modules, wouldn't this lead to the information leakage of the target sequence feature representation?

ChenxinAn-fdu commented 2 years ago

Yes, actually z_y contains the information of the source sequence. Since all negatives has to attend the same source sequence output H_X, ( the i-th example yi has z{y_i} =g(y_i, H_x) ) so that we think this leakage may not affect the optimization of contrastive loss too much. We did not try this in our paper because getting the feature from the decoder without cross attention requires modifying the base model and makes it inconvenient to replace the mainstream MLE model with CoNT.