Open PKU-DataLab opened 3 weeks ago
In the file Oryx/oryx/model/oryx_arch.py
for idx in range(len(modalities)): img_feat_highres, img_size_highres = self.get_model().vision_resampler(highres_img_features[idx], modalities[idx], highres_img_sizes[idx]) img_feat_lowres, img_size_lowres = self.get_model().vision_resampler(lowres_img_features[idx], modalities[idx], lowres_img_sizes[idx]) img_feat = self.get_model().mm_projector(img_feat_lowres, img_size_lowres, img_feat_highres, img_size_highres, modalities[idx]) image_features.append(img_feat.flatten(0, 1))
Encountering same issue
Hi, we have checked the code and found the shape of the video tokens normal. Could you provide more information about this issue?
In the file Oryx/oryx/model/oryx_arch.py