google-research / pix2seq

Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
Apache License 2.0
857 stars 71 forks source link

Input multiple sequences per image #17

Open qihao067 opened 1 year ago

qihao067 commented 1 year ago

Hello, fantastic work on Pix2seq v1 and v2.

I have a question regarding handling multiple sequences for one image. In the following code, it seems that we can input multiple sequences by using a tensor with size (bsz, instances, seqlen). The current version use seq with size (bsz, seqlen).

https://github.com/google-research/pix2seq/blob/6d45f77fcbb1905aca3e42678a2a079907ad17d0/models/ar_model.py#L84 https://github.com/google-research/pix2seq/blob/6d45f77fcbb1905aca3e42678a2a079907ad17d0/models/ar_model.py#L85

(https://github.com/google-research/pix2seq/blob/6d45f77fcbb1905aca3e42678a2a079907ad17d0/models/ar_model.py#L84-#L85)

I tried this but it failed:

`ValueError: Exception encountered when calling layer "ar_decoder" " f"(type AutoregressiveDecoder).

    in user code:

        File "/pix2seq/architectures/transformers.py", line 684, in call  *
            _, seqlen = get_shape(tokens)

        ValueError: too many values to unpack (expected 2)

    Call arguments received by layer "ar_decoder" "                 f"(type AutoregressiveDecoder):
      • tokens=tf.Tensor(shape=(32, 3, 500), dtype=int64)
      • encoded=tf.Tensor(shape=(32, 1600, 512), dtype=float32)
      • training=True

Call arguments received by layer "model" "                 f"(type Model):
  • images=tf.Tensor(shape=(32, 640, 640, 3), dtype=float32)
  • seq=tf.Tensor(shape=(32, 500), dtype=int64)
  • training=True`

Do you have any idea? Have you tried to use multiple sequences and how to do that?

Thank you!!!

chentingpc commented 1 year ago

One workaround is to reshape the tensor from (bsz, instances, seqlen) into (bsz * instances, seqlen) for the model and then reshape back after done. Hope this helps.

On Tue, Sep 13, 2022 at 6:26 PM Qihao Liu @.***> wrote:

Hello, fantastic work on Pix2seq v1 and v2.

I have a question regarding handling multiple sequences for one image. In the following code, it seems that we can input multiple sequences by using a tensor with size (bsz, instances, seqlen). The current version use seq with size (bsz, seqlen).

https://github.com/google-research/pix2seq/blob/6d45f77fcbb1905aca3e42678a2a079907ad17d0/models/ar_model.py#L84

I tried this but it failed:

`ValueError: Exception encountered when calling layer "ar_decoder" " f"(type AutoregressiveDecoder).

in user code:

    File "/pix2seq/architectures/transformers.py", line 684, in call  *

        _, seqlen = get_shape(tokens)

    ValueError: too many values to unpack (expected 2)

Call arguments received by layer "ar_decoder" "                 f"(type AutoregressiveDecoder):

  • tokens=tf.Tensor(shape=(32, 3, 500), dtype=int64)

  • encoded=tf.Tensor(shape=(32, 1600, 512), dtype=float32)

  • training=True

Call arguments received by layer "model" " f"(type Model):

• images=tf.Tensor(shape=(32, 640, 640, 3), dtype=float32)

• seq=tf.Tensor(shape=(32, 500), dtype=int64)

• training=True`

Do you have any idea? Have you tried to use multiple sequences and how to do that?

Thank you!!!

— Reply to this email directly, view it on GitHub https://github.com/google-research/pix2seq/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKERUMAFTIFTLFXBIDVQFTV6ESUNANCNFSM6AAAAAAQL6L3OA . You are receiving this because you are subscribed to this thread.Message ID: @.***>