[x] My code has the necessary comments and documentation(if needed).
[x] I have added relevant tests.
Description
When performing video inference, if a frame does not have a mask result, it cannot be merged to complete the final result:
File "./Projects/Grounded-SAM-2_wjh/utils/common_utils.py", line 52, in draw_masks_and_box_with_supervision
all_object_masks = np.concatenate(all_object_masks, axis=0)
ValueError: need at least one array to concatenate
Pre-Checklist
Please complete ALL items in the following checklist.
Description
When performing video inference, if a frame does not have a mask result, it cannot be merged to complete the final result:
Test