VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
1.85k
stars
149
forks
source link
Potential bug in mm_utils.py process_image function #54
Open
hubenjm opened 4 months ago
When
data_args.image_aspect_ratio = 'resize'
, it seems that mm_utils.process_image returns the image as a PIL.Image.Image data type, which has noshape
attribute. See https://github.com/Efficient-Large-Model/VILA/blob/main/llava/mm_utils.py#L168When doing stage 1 alignment training, we use the
datasets.LazySupervisedDataset
class, whoseget_item
function tries to callimage.shape
here: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/data/dataset.py#L834This crashes the training. So should we simply add the line
image = processor.preprocess(image, return_tensors="pt")["pixel_values"][0]
below line 168 of mm_utils.py: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/mm_utils.py#L168 ?