TACJu / TransFG

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).
MIT License
382 stars 88 forks source link

About Stanford dogs accuracy #24

Open EdwinKuo1337 opened 2 years ago

EdwinKuo1337 commented 2 years ago

Hi, could you release your training settings for the Stanford dogs dataset? I set the lr to 3e-3 and did not change other settings, however the model is underfitting. I only get 1.7% accuracy after 200k steps.

oliver8459 commented 2 years ago

I used 4 tesla v100(32G) gpus, batch_size=14(16 is OOM) to reproduce Dog and kept the same config with your paper, but the acc is only 90.5, a big difference from the paper 92.3.

oliver8459 commented 2 years ago

I used 4 tesla v100(32G) gpus, batch_size=14(16 is OOM) to reproduce Dog and kept the same config with your paper, but the acc is only 90.5, a big difference from the paper 92.3.

Hi Oliver, just wondering if you can share your pretrained model with me? Thanks in advance!

Thanks for your reply, i use the pretrained model "VIT_B16" downloaded from your link. By the way, i removed the "part_select" and "part_layer"(like pure vit), the performance is similar with TransFG which i reproduced 90.5.

` class Encoder(nn.Module): def init(self, config): super(Encoder, self).init() self.layer = nn.ModuleList() for _ in range(config.transformer["num_layers"] - 1): layer = Block(config) self.layer.append(copy.deepcopy(layer))

self.part_select = Part_Attention()

    # self.part_layer = Block(config)
    self.part_norm = LayerNorm(config.hidden_size, eps=1e-6)

def forward(self, hidden_states):
    # attn_weights = []
    for layer in self.layer:
        hidden_states, _ = layer(hidden_states)
        # attn_weights.append(weights)            
    # part_num, part_inx = self.part_select(attn_weights)
    # part_inx = part_inx + 1
    # parts = []
    # B, num = part_inx.shape
    # for i in range(B):
    #     parts.append(hidden_states[i, part_inx[i,:]])
    # parts = torch.stack(parts).squeeze(1)
    # concat = torch.cat((hidden_states[:,0].unsqueeze(1), parts), dim=1)
    # part_states, part_weights = self.part_layer(concat)
    # part_encoded = self.part_norm(part_states)  
    part_encoded = self.part_norm(hidden_states) 

    return part_encoded

`