edmuthiah commented 2 years ago

Hello,

I'm looking to use this repository on PyTorch 1.9 and CUDA 11.1 as I'm trying use an RTX3090 for training.

I've tried the following combinations: pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html pip3 install torch==1.8.2+cu111 torchvision==0.9.2+cu111 torchaudio==0.8.2 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html

But I get the following error when running train.py:

I have reviewed the following: https://github.com/WongKinYiu/ScaledYOLOv4/issues/243 https://github.com/WongKinYiu/ScaledYOLOv4/issues/196 https://github.com/WongKinYiu/ScaledYOLOv4/issues/193 https://github.com/WongKinYiu/ScaledYOLOv4/issues/191 https://github.com/WongKinYiu/ScaledYOLOv4/issues/271

All of them suggest downgrading, which is not possible for a RTX3090.

I'm happy to contribute but don't really understand what's causing the incompatibility with the above torch versions. Using PyTorch 1.9 Mish I've tried to change all mentions of self.act = Mish() to self.act= nn.Mish(inplace=False) in the /models/common.py file. However, this still throws the same error.

Thanks! @WongKinYiu @digantamisra98

jackhu-bme commented 2 years ago

Well,this error is not hard to solve. I tried this link: https://blog.csdn.net/Xunuo1995/article/details/115454076?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522163024050016780357294915%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fall.%2522%257D&request_id=163024050016780357294915&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~first_rank_ecpm_v1~rank_v29_ecpm-1-115454076.first_rank_v2_pc_rank_v29&utm_term=b%5B%3A%2C4%5D%2B%3Dmath.log%288%2F%28640%2Fs%29+a+view+of+a+leaf+variable+yolov5&spm=1018.2226.3001.4187 and this worked well by adding "with torch.no_grad():" I successfully trained on RTX3090 weeks ago. Another thing to mention is that there's another error you'll face if you need to use test.py the output variable is a list of torch.tensor, and this sentence in the python script don't work: output.cpu().numpy() I simply wrote this instead: output2 = [] for m in output: (tab here)m = m.cpu() (tab here)output2.append(m) output2 = output And it worked successfully. Well, that's all you need to do when training on RTX3090 and with a high pytorch version. If you find it useful please give me a like,thanks.

jackhu-bme commented 2 years ago

Well if you can't read the Chinese blog I mentioned before, just do this: with torch.no_grad(): b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image) b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True) add the "with"sentence and the two tabs.

jackhu-bme commented 2 years ago

the two tabs are before the two sentences following with sentence.The web page can't show them.

jackhu-bme commented 2 years ago

@ed-muthiah

edmuthiah commented 2 years ago

Thanks @Jack-Hu-2001

This works for me 🙂 I also want to note that this solution works too:

The reason that .data works is presented in this stackoverflow:

If y = x.data then y will be a Tensor that shares the same data with x, is unrelated with the computation history of x, and has requires_grad=False.

I think your solution is probably the more recent way to do it. Are you using the supplied from mish_cuda import MishCuda as Mish or are using the new PyTorch 1.9 Mish (m = nn.Mish())?

jackhu-bme commented 2 years ago

That's a pretty good solution, thanks for sharing! I did't use it for it's PyTorch1.8.1in my environment,highest supported on remote server. So I'm simply using the mishcuda. If there's any problem with m =nn.Mish(), you can share it here. I'm sorry that the problem I mentioned in test.py is not in test.py, it's in utils/general.py about line1084 here: def output_to_target(output, width, height):

Convert model output to target format [batch_id, class_id, x, y, w, h, conf]

if isinstance(output, torch.Tensor):
    output = output.cpu().numpy()

and another possible solution: https://github.com/WongKinYiu/ScaledYOLOv4/issues/318 Sorry for my bad memory. Actually it apears when you use test.py on a high pytorch version. @ed-muthiah

WongKinYiu / ScaledYOLOv4

PyTorch 1.9 and CUDA 11.1 Support #324

Convert model output to target format [batch_id, class_id, x, y, w, h, conf]