Wrong reshape order in PEG still exists

dreamofuture commented 4 months ago

thanks for your attention to this problem, but seems not solved completely after your correction. Peg->forward->return x.reshape(orig_shape): x.shape is [B, THW, C], but orig_shape is [BHW, T, C] or [BT, HW, C] I fixed locally and the provided models works almost ok, but the vieo-reconstructed-result is incoherent, are these models need retrain ?

and two other problems:

q_stride in do_pool and Attention seems used for spatial downsample, but i can't understand how the code below works: [B, HW, ...].view(B, q_stride, -1, ...).max(dim=1).values though q_stride always being 1 in your code will not cause error, but it makes confusion
when use scaled_dot_product_attention for temporal attention, seems no temporal-pos added to q/k

wdrink commented 4 months ago

Really appreciate your findings, we did mis-process the tensor shape in PEG, the code has been updated and will update the checkpoints afterwards. As you mention, q_stride is not used in our code, I remove it to avoid the misunderstanding.

wdrink commented 4 months ago

Feel sorry for the confusion, will temporarily roll back to the training code to obtain better reconstruction results.

FoundationVision / OmniTokenizer

Wrong reshape order in PEG still exists #12