Not able to use model with Fine Tuned Model weights

owenip commented 3 months ago

Hi, I am having problem with loading a fine tuned model weight. Perhaps someone with fine tuning SAM2 experience can shine some light on this.

The fine tuned model is working fine right after training loop but not able to segment anything if loading the same weights from torch file. I have reran and double check the loaded weights.

There are no key missing or error when loading
No NaN values or Infinite values
mean().item() of parameters from both model are identicial
Tried saving/loading entire torch model
Tried loading via build_sam()
Tried loading with model.load_state_dict
Started with the sam2_hiera_tiny checkpoint from repo

Below is an example of model output. Only single bounding box prompt is being used. the fine tuned model from training loop is able to segment the target but the model with loading fine tuned via model state_dict() is not.

Model 1 is the fine tuned model from training loop Model 2 is the model with loading fine tuned model state_dict()

The image embeds are the same
img_embeds comparison:
high_res_feats[0] comparison: Shape: torch.Size([1, 32, 256, 256]) Mean absolute difference: 0.128906 Max absolute difference: 0.781250 Model1 - Mean: -0.002594, Std: 0.089844 Model2 - Mean: -0.017944, Std: 0.166016
low_res_masks comparison (output clamped between -32 and 32) Shape: torch.Size([1, 1, 256, 256]) Mean absolute difference: 1.703125 Max absolute difference: 64.000000 Model1 - Mean: -30.250000, Std: 4.468750 Model2 - Mean: -32.000000, Std: 0.000000

lzl2040 commented 2 months ago

Have you solved this problem? I also met this problem.

arawxx commented 2 months ago

OH GOD. I've been dealing with this problem for a whole week now and it's driving me insane. I could not solve it yet. I hypothesize it has something to do with the memory block...

owenip commented 2 months ago

It's driving me crazy as well. I have no idea where or what went wrong

heyoeyo commented 2 months ago

I might be misunderstanding the setup, but if they're supposed to be the same, then it seems there's something going wrong with the hi-res embedding. As a sanity check, maybe it's worth disabling the use of the hi-res embeddings (by setting the use_high_res_features_in_sam config to False) to see if that avoids the mask output breaking. It would also be interesting to turn off the +/- 32 clamping to see if the output mask is generating a reasonable pattern but with overly negatively values or whether it's going off to negative infinity? If it's outputting infinity, then it may be a data type/numerical issue, in which case switching to float32 could help if it's not already used.

I hypothesize it has something to do with the memory block...

At least for image segmentation, you can disable the use of memory features on the image encoder by turning off the directly_add_no_mem_embed config setting.

bidulgi123 commented 1 week ago

If you use torch.cuda.amp.autocast during training and prediction, try changing it to torch.cuda.amp.autocast(enabled=False). It is most likely a mixed precision issue.

facebookresearch / sam2

Not able to use model with Fine Tuned Model weights #246