alibaba / easyrobust

EasyRobust: an Easy-to-use library for state-of-the-art Robust Computer Vision Research with PyTorch.
Apache License 2.0
325 stars 37 forks source link

Inference of DAT model #12

Closed qihao067 closed 1 year ago

qihao067 commented 1 year ago

Hi, thank you for your work of 'Discrete Adversarial Training'

I was trying to use the pre-tained model for inference. The [ViT-B/16] works pretty well. However, the [MAE-ViT-H] doesn't work. Could you tell the difference between this two model when testing them on the custom images?

For the [MAE-ViT-H], I used vit_huge_patch14_224 instead of mae_vit_huge_patch14 to initialize the model in timm. (the later can not be found in timm model zoo.)

I normalize the input image with T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))

Thank you !

vtddggg commented 1 year ago

I'm not sure if vit_huge_patch14_224 is fully compatible. As far as we know mae_vit_huge_patch14 uses global pooling and an extra fc_norm. There may have other differences we are not mentioned causing your problem. You can set strict=True to check if all parameters are aligned.

A safe solution of this problem we advice is that doing import easyrobust.models and then setting --model mae_vit_huge_patch14, instead of using vit_huge_patch14_224

vtddggg commented 1 year ago

Insert import easyrobust.models into here, then you can use mae_vit_huge_patch14

vtddggg commented 1 year ago

fixed in https://github.com/alibaba/easyrobust/commit/1b2217c811cc22e81c5d09ad146c013602966e49

qihao067 commented 1 year ago

Great! Thank you! It solved the problem.