askforalfred / alfred

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
MIT License
360 stars 77 forks source link

Get different feature vectors when loading from feat_conv.pt and from resnet.featurize() #52

Closed shuyanzhou closed 3 years ago

shuyanzhou commented 3 years ago

Hi,

I am doing this example: full_2.1.0/train/pick_and_place_with_movable_recep-ButterKnife-Cup-SinkBasin-2/trial_T20190908_233322_447979/raw_images

For image 000000000.jpg

feat = resnet.featurize([Image.open(fname)], batch=1)
print(feat[0][:5,:5])

I get

tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0188, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.6659, 0.2334, 0.4881, 0.0220],
         [0.0000, 0.0278, 0.0000, 0.1306, 0.0000, 0.0177, 0.0000],
         [0.2511, 0.9387, 0.5719, 0.2475, 0.1024, 0.3862, 0.1884],
         [1.4310, 1.5767, 0.8272, 0.0000, 0.0000, 0.0000, 0.3523]],

        [[0.0000, 0.0549, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.1023, 0.2866, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.4572, 0.0000, 0.0000, 0.0000, 0.4733, 0.8688, 0.6622],
         [0.9684, 0.0573, 0.0000, 0.0000, 0.0489, 0.0000, 0.0655]],

        [[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]],

        [[1.8689, 2.2256, 0.6508, 1.0258, 0.5759, 0.9021, 0.6726],
         [2.5713, 3.0914, 1.0797, 1.3719, 0.9788, 1.8322, 1.6944],
         [3.7268, 3.6742, 1.5358, 1.2200, 0.9661, 2.6235, 2.2480],
         [4.2898, 4.0467, 1.9082, 0.6326, 0.2264, 1.4761, 1.8504],
         [4.6841, 4.2882, 1.2279, 0.0133, 0.0000, 0.8532, 1.6886]],

        [[0.0000, 0.0000, 0.0000, 0.1095, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.3814, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.7656, 0.0000, 0.4508, 0.0000],
         [0.0000, 0.0000, 0.0000, 1.1697, 0.3166, 0.8373, 0.0000],
         [0.0000, 0.0000, 0.0052, 1.3592, 0.6858, 1.1441, 0.0000]]])

When using

x = torch.load("feat_conv.pt")
print(x[0][:5,:5])

I get

tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0987, 0.0000, 0.4179, 0.0000, 0.3932, 0.0267],
         [0.1942, 0.3280, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.4623, 1.1134, 0.7551, 0.0720, 0.0000, 0.2512, 0.1772],
         [1.6237, 1.7352, 1.0370, 0.0000, 0.0000, 0.0000, 0.2772]],

        [[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.1637, 0.0102, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.2434, 0.0000, 0.0000, 0.0000, 0.1376, 0.5788, 0.3524],
         [0.7411, 0.0000, 0.0000, 0.0000, 0.2674, 0.1301, 0.0218]],

        [[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]],

        [[1.7606, 1.9361, 0.6170, 1.0957, 0.7715, 1.0960, 0.9179],
         [2.4290, 2.7654, 0.9339, 1.2526, 1.0871, 1.8147, 1.7731],
         [3.6967, 3.4739, 1.3849, 1.0991, 1.0435, 2.3166, 1.9403],
         [4.4716, 4.3128, 2.0116, 0.7454, 0.2938, 1.2445, 1.4761],
         [5.1107, 4.7808, 1.3802, 0.2358, 0.0000, 1.0597, 1.6283]],

        [[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.1443, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.6003, 0.0000, 0.3137, 0.0000],
         [0.0000, 0.0000, 0.0000, 1.2218, 0.5109, 0.9440, 0.0120],
         [0.0000, 0.0000, 0.0878, 1.5063, 0.8553, 1.0575, 0.0000]]])
MohitShridhar commented 3 years ago

@shuyanzhou, have you tried normalizing the image tensor before passing it to the ResNet? See this.

MohitShridhar commented 3 years ago

Also, make sure to use ResNet with eval mode. Otherwise the batch-norm layers will mess up the output

MohitShridhar commented 3 years ago

Ooops sorry, you are using the same function

MohitShridhar commented 3 years ago

Double check that it's in eval mode. It could also be the PNG->JPG w/ compression conversion that's causing minor differences. Originally, we saved PNG files, but these were later converted to JPGs to save space.

shuyanzhou commented 3 years ago

I am using the models.utils.extract_resnet.py and simply add

x = Image.open("full_2.1.0/train/pick_and_place_with_movable_recep-ButterKnife-Cup-SinkBasin-2/trial_T20190908_233322_447979/raw_images/000000000.jpg")
feat = extractor.featurize([x], batch=1)
print(feat[0][:5,:5])

The eval model is set True in the code originally.

I tried both jpg and png (by simply change the name), but they give the same result

MohitShridhar commented 3 years ago

I found the original PNG, can you try it with this: 000000000

MohitShridhar commented 3 years ago

I also wonder if it's somehow related to this issue: https://github.com/askforalfred/alfred/issues/27#issuecomment-637436775

It might be that the pre-trained ResNet model from torchvision somehow changed recently.

shuyanzhou commented 3 years ago

Awesome, it is due to the different file formats. The results are consistent now. BTW, would it be convenient for you to give me all the images of the png format for this example? Or is there a way I can get it by myself? Thanks!

MohitShridhar commented 3 years ago

Well the dataset with raw PNG files was 2-3x bigger than the one with JPGs. So we couldn't figure out a proper way of hosting it.

The faster fix is to just run extract_resnet.py on your local PNGs and then re-train the model with those features.

lajanugen commented 2 years ago

I noticed a difference in feature vectors produced by pytorch resnet when run on a gpu vs cpu. The difference is non-negligible, in the 1e-2 range. Perhaps this is related to the reproducibility issues reported in many threads.

MohitShridhar commented 2 years ago

wow very strange.