FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
https://groma-mllm.github.io/
Apache License 2.0
544 stars 57 forks source link

Batch size setting in the evaluation process #14

Closed rongfu-dsb closed 3 months ago

rongfu-dsb commented 4 months ago

Hello, thank you for your excellent work! May I ask if the batch_size for the model in the evaluation process can only be set to 1? I evaluated the REC task, and when I set the batch_size to a value greater than 1, the output bounding box results were not very satisfactory. If the batch size can be set to a value greater than 1, should I improve this code?

1717738684635
machuofan commented 4 months ago

Yes, the evaluation code by default requires batch_size=1. You need to modify the collate fn to support batch_size > 1.

截屏2024-06-07 下午2 41 34
rongfu-dsb commented 4 months ago

我想在自定义的数据集上使用Groma,但是该数据集评估时batch_size>1,能否直接改变模型的配置,而不是改变dataloader的设置来适配bs>1的情况呀。我使用的是类似与eval_rec.py文件的代码 这段代码好像只适配bs=1的情况,如何对其进行修改呢: output_ids = outputs.sequences pred_boxes = outputs.hidden_states[0][-1]['pred_boxes'][0].cpu() input_token_len = input_ids.shape[1] predicted_box_tokens = [id for id in output_ids[0, input_token_len:] if id in model.box_idx_token_ids] selected_box_inds = [model.box_idx_token_ids.index(id) for id in predicted_box_tokens] selected_box_inds = [id for id in selected_box_inds if id < len(pred_boxes)] if len(selected_box_inds) == 0: invalid += 1 continue selected_boxes = pred_boxes[selected_box_inds]

machuofan commented 3 months ago

outputs.hidden_states[0][-1]['pred_boxes'] stores batched box predictions. Therefore, for bs > 1, you can simply iterate it:

for pred_boxes in outputs.hidden_states[0][-1]['pred_boxes'].cpu():
    ...
rongfu-dsb commented 3 months ago

好的,这个问题我已经解决了,祝您学术长青!