Alibaba-MIIL / ML_Decoder

Official PyTorch implementation of "ML-Decoder: Scalable and Versatile Classification Head" (2021)
MIT License
315 stars 52 forks source link

an import error #18

Closed fsted closed 2 years ago

fsted commented 2 years ago

It's a good paper,but I meet a import error of inplace_abn: File "/home/ML_Decoder/src_files/models/tresnet/tresnet.py", line 10, in from inplace_abn import InPlaceABN, ABN File "/home/anaconda3/envs/ml/lib/python3.7/site-packages/inplace_abn/init.py", line 1, in from .abn import ABN, InPlaceABN, InPlaceABNSync File "/home/yujialin/anaconda3/envs/ml/lib/python3.7/site-packages/inplace_abn/abn.py", line 8, in from .functions import inplace_abn, inplace_abn_sync File "/home/yujialin/anaconda3/envs/ml/lib/python3.7/site-packages/inplace_abn/functions.py", line 8, in from . import _backend ImportError: /home/yujialin/anaconda3/envs/ml/lib/python3.7/site-packages/inplace_abn/_backend.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at4cuda6detail20canUse32BitIndexMathERKNS_6TensorEl

This really confuses me, what to do with it,Or is there any way I can replace inplace_abn?

mrT23 commented 2 years ago

make sure you installed the dependencies succefullu. TResNet requires inplace_abn. you need GPU for that

or use a different model.

Q1ngS0ng commented 1 year ago

/

I meet the same issue. How do you save it? could you help me?

XavierYu404 commented 8 months ago

It's a good paper,but I meet a import error of inplace_abn: File "/home/ML_Decoder/src_files/models/tresnet/tresnet.py", line 10, in from inplace_abn import InPlaceABN, ABN File "/home/anaconda3/envs/ml/lib/python3.7/site-packages/inplace_abn/init.py", line 1, in from .abn import ABN, InPlaceABN, InPlaceABNSync File "/home/yujialin/anaconda3/envs/ml/lib/python3.7/site-packages/inplace_abn/abn.py", line 8, in from .functions import inplace_abn, inplace_abn_sync File "/home/yujialin/anaconda3/envs/ml/lib/python3.7/site-packages/inplace_abn/functions.py", line 8, in from . import _backend ImportError: /home/yujialin/anaconda3/envs/ml/lib/python3.7/site-packages/inplace_abn/_backend.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at4cuda6detail20canUse32BitIndexMathERKNS_6TensorEl

This really confuses me, what to do with it,Or is there any way I can replace inplace_abn?

for tresnet.py you need to add a new class named NewABN to replace the InplaceABN in conv2d_ABN function:

class NewABN(torch.nn.Module):
    def __init__(self, num_features, activation='leaky_relu', activation_param=None):
        super(NewABN, self).__init__()
        self.num_features = num_features
        self.activation = activation
        self.activation_param = activation_param
        self.activation_func = None

        self.bn = nn.BatchNorm2d(num_features=self.num_features)
        if self.activation == 'leaky_relu':
            self.activation_func = nn.LeakyReLU(negative_slope=self.activation_param)
        if self.activation == 'identity':
            self.activation_func = nn.Identity()

    def forward(self, x):
        output = self.bn(x)
        output = self.activation_func(output)
        return output

def conv2d_ABN(ni, nf, stride, activation="leaky_relu", kernel_size=3, activation_param=1e-2, groups=1):
    return nn.Sequential(
        nn.Conv2d(ni, nf, kernel_size=kernel_size, stride=stride, padding=kernel_size // 2, groups=groups,
                  bias=False),
        NewABN(num_features=nf, activation=activation, activation_param=activation_param)
    )

then you need to change the initial code approximately at the position of the 180 line of code:

        # model initilization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='leaky_relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, NewABN):
                nn.init.constant_(m.bn.weight, 1)
                nn.init.constant_(m.bn.bias, 0)

        # residual connections special initialization
        for m in self.modules():
            if isinstance(m, BasicBlock):
                m.conv2[1].bn.weight = nn.Parameter(torch.zeros_like(m.conv2[1].bn.weight))  # BN to zero
            if isinstance(m, Bottleneck):
                m.conv3[1].bn.weight = nn.Parameter(torch.zeros_like(m.conv3[1].bn.weight))  # BN to zero
            if isinstance(m, nn.Linear): m.weight.data.normal_(0, 0.01)

After doing this, you can train the model without inplace_abn module

for infer.py you just need to comment 3 lines of code as follows:

########### eliminate BN for faster inference ###########
# model = model.cpu()
# model = InplacABN_to_ABN(model)
# model = fuse_bn_recursively(model)
model = model.cuda().half().eval()
#######################################################