facebookresearch / r3m

Pre-training Reusable Representations for Robotic Manipulation Using Diverse Human Video Data
https://sites.google.com/view/robot-r3m/
MIT License
292 stars 45 forks source link

Reproducing Moco345/ImNet results #15

Closed MathisClautrier closed 2 years ago

MathisClautrier commented 2 years ago

Hi,

I am trying to reproduce your results using other backbones than yours,

However when using Moco345 (weights taken from https://github.com/sparisi/pvr_habitat/releases/tag/models; moco_crop, moco_crop_l4, moco_crop_l3) I obtain an average success rate (using 25 demos) of 28% while when I use pretrained pytorch model I get 34%.

Based on your article, I expected to get the opposite results. Could you clarify this point? (I used exactly the same code with a new Conda environment and consistent results using your model).

Thanks

suraj-nair-1 commented 2 years ago

Hi Mathis,

Just to confirm - I'm assuming the numbers you are reporting are average over the 5 Franka Kitchen tasks x 3 viewpoints? And only using 25 demos. Assuming this is correct, the ~34% for supervised ImageNet seems reasonable and matches Fig 6 of the paper.

For MoCo345:

MathisClautrier commented 2 years ago

Thank you for your quick answer,

Yes, reported numbers are averaged over the 5 tasks and 3 viewpoints using only 25 demos.

I will try multiple seeds for MoCo345 it might explain this gap . For preprocessing I used the PVR function:

transforms = nn.Sequential(
        T.Resize(256, interpolation=3) if 'mae' in embedding_name else T.Resize(256),
        T.CenterCrop(224),
        T.ConvertImageDtype(torch.float),
        T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
    )

I just added T.ToTensor() after T.CenterCrop(224) as we aren't feeding tensors directly.

suraj-nair-1 commented 2 years ago

Hi @MathisClautrier, Were you able to get the MoCo 345 model working?

MathisClautrier commented 2 years ago

Hi,

Unfortunately, no, I took care of the preprocessing and used different seeds. I guess the weights used are not the right ones.

Thanks for helping me

suraj-nair-1 commented 2 years ago

Got it, then probably the publicly released models are different than the internal checkpoints I used back in January. Apologies for the confusion.

Since there isn't anything to fix here I'll go ahead and close this issue for now, and perhaps add a note in the paper that the experiments use an earlier set of MoCo345 weights than the publicly released ones. Thanks for looking into this.