dengxl0520 / MemSAM

[CVPR 2024 Oral] MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation.
MIT License
130 stars 11 forks source link

ValueError: Target size (torch.Size([1, 2, 256, 256])) must be the same as input size (torch.Size([1, 10, 256, 256])) #10

Open SUNNYDAY-4176 opened 4 months ago

SUNNYDAY-4176 commented 4 months ago

Hi! Thanks for your work! When I running your code, I meet with this error:

image

Do you know why the Target size (torch.Size([1, 2, 256, 256])) is different from input size (torch.Size([1, 10, 256, 256])) ?

SUNNYDAY-4176 commented 4 months ago

I print the image size and the mask size from the EchoVideoDataset() in the '/MemSAM-main/utils/data_us.py', I found the size are different:

image
dengxl0520 commented 4 months ago

You need to set the correct parameters about semi-supervision and frame lengths. The EchoNet-Dynamic dataset is annotated with only two frames, and you need to discard the unlabeled frames from the model outputs and keep only the labeled frames for the loss calculation.