google-research / pix2seq

Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
Apache License 2.0
857 stars 71 forks source link

Inconsistency between paper and code #10

Closed zyc573823770 closed 2 years ago

zyc573823770 commented 2 years ago

It seems that the labels in output target are either rand or fake, without any ground truth in it which is different from the paper in the end of section 2.3 that After noise objects are synthesised and discretized, we then append them in the end of the original input sequence.

https://github.com/google-research/pix2seq/blob/6d45f77fcbb1905aca3e42678a2a079907ad17d0/tasks/object_detection.py#L399 image

zyc573823770 commented 2 years ago

image

chentingpc commented 2 years ago

apologies for late reply. The response_seq_class_m is used for the input sequence (the stuff that transformer decoder sees), the label/target sequence (the stuff that transformer decoder is asked to predict) is from response_seq which is either real tokens or fake class token. hope that clarifies it.

huimlight commented 1 year ago

apologies for late reply. The response_seq_class_m is used for the input sequence (the stuff that transformer decoder sees), the label/target sequence (the stuff that transformer decoder is asked to predict) is from response_seq which is either real tokens or fake class token. hope that clarifies it.

Why do I randomly change the label of the Input Sequence? I don't really understand.