WongKinYiu / ScaledYOLOv4

Scaled-YOLOv4: Scaling Cross Stage Partial Network
GNU General Public License v3.0
2.02k stars 574 forks source link

No Kernel Image or GIoU is nan #169

Open kevinfang2 opened 3 years ago

kevinfang2 commented 3 years ago

If I use the torch==1.6.0a0+9907a3e from the docker container, then it tells me RuntimeError: CUDA error: no kernel image is available for execution on the device

If I upgrade to torch==1.7.0, it runs but everything is nans, from the moment it is first predicted. I printed the p from compute_loss() in utils/general.py and it is full of nans from the first run, and because of it, the GIoU, obj, and total are all nan and don't change after training. I've added the

command is python train.py --device 0 --single-cls --weights weights/yolov4-csp.weights --data data/dataset.yaml --epochs 50 --cfg yolov4.cfg --batch-size 12

output is

Screen Shot 2021-02-12 at 1 21 34 AM
WongKinYiu commented 3 years ago

could you show your train_batch0.jpg?

AnhDai1997 commented 3 years ago

I have same problem, this is train_batch0.jpg My command is: python train.py --batch-size 2 --img 416 416 --data Chess_Pieces_data/data.yaml --cfg yolov4-p5.yaml --weights '' --name yolov4-p5 train_batch0