No Kernel Image or GIoU is nan

kevinfang2 commented 3 years ago

If I use the torch==1.6.0a0+9907a3e from the docker container, then it tells me RuntimeError: CUDA error: no kernel image is available for execution on the device

If I upgrade to torch==1.7.0, it runs but everything is nans, from the moment it is first predicted. I printed the p from compute_loss() in utils/general.py and it is full of nans from the first run, and because of it, the GIoU, obj, and total are all nan and don't change after training. I've added the

command is python train.py --device 0 --single-cls --weights weights/yolov4-csp.weights --data data/dataset.yaml --epochs 50 --cfg yolov4.cfg --batch-size 12

output is

WongKinYiu commented 3 years ago

could you show your train_batch0.jpg?

AnhDai1997 commented 3 years ago

I have same problem, this is train_batch0.jpg My command is: python train.py --batch-size 2 --img 416 416 --data Chess_Pieces_data/data.yaml --cfg yolov4-p5.yaml --weights '' --name yolov4-p5

WongKinYiu / ScaledYOLOv4

No Kernel Image or GIoU is nan #169