google-research / pix2seq

Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
Apache License 2.0
857 stars 71 forks source link

Question about inference #14

Closed ejlee95 closed 2 years ago

ejlee95 commented 2 years ago

I have a question about inference code.

In tasks/task_utils.py, function 'decode_object_seq_to_bbox', there is no function that acts as 'end-of-sequence'.

image

I think PADDING_TOKEN (which denoted as 0 in vocab.py) is the token that indicates end-of-seq, but in the inference time, the code does not use the token.

Am I right?

chentingpc commented 2 years ago

yes padding token is also the ending token. with sequence augmentation, we won't use ending token, rather predict to maximum length, and use likelihood to score each instance/prediction.

thomas0809 commented 1 year ago

Hi Ting,

Thanks for the great research work and the code. In the paper, you mentioned the prediction of a maximum length at inference. I didn't find it in the code. Could you please point me to where it is implemented?

chentingpc commented 1 year ago

Please see https://github.com/google-research/pix2seq/blob/83089ca98e077dd0a6864c2a68583b474df7a2d4/models/ar_model.py#L171

On Tue, Dec 6, 2022 at 1:44 PM Yujie Qian @.***> wrote:

Hi Ting,

Thanks for the great research work and the code. In the paper, you mentioned the prediction of a maximum length at inference. I didn't find it in the code. Could you please point me to where it is implemented?

— Reply to this email directly, view it on GitHub https://github.com/google-research/pix2seq/issues/14#issuecomment-1340050838, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKERUP44U75EJPHWFLYWADWL6XVPANCNFSM523VBW6Q . You are receiving this because you commented.Message ID: @.***>

thomas0809 commented 1 year ago

Thanks for the prompt reply! I misunderstood your paper: I thought you were predicting an actual length of the generated sequence, but actually it is "predict to a maximum length".

Did you do any filtering based on the likelihood scores for the evaluation? Otherwise, the model is predicting the same number of objects for all images?

chentingpc commented 1 year ago

We rank the prediction using the likelihood score during evaluation (AP takes a ranked list to compute)

On Wed, Dec 7, 2022 at 2:07 PM Yujie Qian @.***> wrote:

Thanks for the prompt reply! I misunderstood your paper: I thought you were predicting an actual length of the generated sequence, but actually it is "predict to a maximum length".

Did you do any filtering based on the likelihood scores for the evaluation? Otherwise, the model is predicting the same number of objects for all images?

— Reply to this email directly, view it on GitHub https://github.com/google-research/pix2seq/issues/14#issuecomment-1341655616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKERUKCWP7VKSBP4ID3VG3WMEDCZANCNFSM523VBW6Q . You are receiving this because you commented.Message ID: @.***>