When we set reverse propagate_in_video, a data type mismatch error occurs.

Background propagate_in_video allows passing start_frame_idx to specify the starting frame for propagation, max_frame_num_to_track to set the maximum length of propagation, and reverse to control the propagation direction, where True indicates reverse propagation.

Error When we set reverse=True propagation from the most recent frame for a fixed length. We got an error： RuntimeError: mat1 and mat2 must have the same dtype, but got BFloat16 and Float

Reason When setting reverse propagation from the most recent frame for a fixed length, the _prepare_memory_conditioned_features method in the SAM2Base class will enter the if not is_init_cond_frame: branch. This branch is generally not entered during forward propagation because the first frame in forward propagation is usually a condition frame. The purpose of this branch is to condition the visual features of the current frame on previous memory. In the aforementioned branch, when feats = prev["maskmem_features"].to(device, non_blocking=True) is retrieved, the value is torch.bfloat16 , which can cause subsequent MemoryAttention data type mismatch errors.

Temporary solution add feats = feats.to(torch.float32)

Question Why prev["maskmem_features"] in SAM2Base class is torch.bfloat16 dtype？Where is it generated?

facebookresearch / sam2

When we set reverse propagate_in_video, a data type mismatch error occurs. #308