WongKinYiu / CrossStagePartialNetworks

Cross Stage Partial Networks
https://github.com/WongKinYiu/CrossStagePartialNetworks
894 stars 172 forks source link

Need help in setting hyper-parameters #11

Open Rajasekhar06 opened 4 years ago

Rajasekhar06 commented 4 years ago

@WongKinYiu I have been trying to set right hyper parameters for yolov3-spp but for complete open-images dataset after 300-400 iterations server restarts I have previously trained with 3 of the classes among 601 classes but then I used single GPU parameters for multi GPU training when training for threee classes and dataset size is also small then like 1100 images or So. But now training on whole dataset with Multi GPU parameters causing system reboot, BTW how do you calculate the parameters for multi-GPU you have already replied to me in previous issues on @AlexeyAB repo at the core how to set burn-in,learning rate, decay in cfg file.as narrated by alexy is causing issue so I changed almost all the parameters to single GPU config except burn-in even then problem persists Screenshot from 2020-02-07 15-49-29 Screenshot from 2020-02-07 15-49-07 For the above hardware here is the link to config I'm using Please help me out Thanks

AlexeyAB commented 4 years ago

But now training on whole dataset with Multi GPU parameters causing system reboot,

This is a hardware issue: power insufficient or hardware bug in GPU.

RajashekarY commented 4 years ago

Should I upgrade my PSU.. to meet the needs or reducing the image resolution to smaller size might also work but decrease in accuracy right

AlexeyAB commented 4 years ago

try to train by using 2-3 GPUs instead of 4.

LukeAI commented 4 years ago

@RajashekarY what is your PSU?

RajashekarY commented 4 years ago

Actually I don't know @LukeAI I remotely use this system I need to ask the owner😛 But

try to train by using 2-3 GPUs instead of 4.

Might get the job done

WongKinYiu commented 4 years ago

The peak of Titan RTX is about 390W, so you need at least 1500W and better 2000W power supply. However, in my experiments, it usually cause by protection of mainboard due to a single PSU is not stable enough for multiple GPUs. In our case, we use dual PSUs for single mainboard.