做完上述修改后,train.py可以正常加载模型开始训练(用的是gelan.yaml),但开始后报错
File "train.py", line 314, in train
scaler.scale(loss).backward()
......
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_batch_norm_backward)
我用的是原版的yolov9,修改了下面几处: train.py的train函数读取模型的部分
Model
......
EMA
torch_utils.py的ModelEMA模块 class ModelEMA: """ Updated Exponential Moving Average (EMA) from https://github.com/rwightman/pytorch-image-models Keeps a moving average of everything in the model state_dict (parameters and buffers) For EMA details see https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage """
common.py的Conv模块,将默认的激活函数换成了IF神经元 class Conv(nn.Module):
Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)
做完上述修改后,train.py可以正常加载模型开始训练(用的是gelan.yaml),但开始后报错 File "train.py", line 314, in train scaler.scale(loss).backward() ...... RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_batch_norm_backward)
之前好像也有人遇到过类似的问题#210,请问最终有找到原因吗,谢谢!