Open kevinfang2 opened 3 years ago
could you show your train_batch0.jpg?
I have same problem, this is train_batch0.jpg My command is: python train.py --batch-size 2 --img 416 416 --data Chess_Pieces_data/data.yaml --cfg yolov4-p5.yaml --weights '' --name yolov4-p5
If I use the torch==1.6.0a0+9907a3e from the docker container, then it tells me
RuntimeError: CUDA error: no kernel image is available for execution on the device
If I upgrade to torch==1.7.0, it runs but everything is nans, from the moment it is first predicted. I printed the p from compute_loss() in
utils/general.py
and it is full of nans from the first run, and because of it, the GIoU, obj, and total are all nan and don't change after training. I've added thecommand is
python train.py --device 0 --single-cls --weights weights/yolov4-csp.weights --data data/dataset.yaml --epochs 50 --cfg yolov4.cfg --batch-size 12
output is