BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.01k stars 18.71k forks source link

Multi-GPU crash in Caffe Library #6891

Open vinaykumarngitub opened 4 years ago

vinaykumarngitub commented 4 years ago

Important - read before submitting

Please read the guidelines for contributing before submitting this issue!

Please do not post installation, build, usage, or modeling questions, or other requests for help to Issues. Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.

Issue summary

The below is the function used to run training jobs in multiple GPU. void Run(const vector& gpus); But, the above function call don't support passing of the following parameters:

  1. Iteration Per Epoch
  2. Resume file
  3. Minimum epoch
  4. Patience To overcome the above said issues, we created one more function call "RunMultipleGPU()" which includes all the said parameters. void RunMultipleGpu(const vector& gpus, std::string Path, int g_IterPerEpoch, const char* resume_file, int iMinEpoch, int iPatience); But the issue is the following When we interrupt while training, it is causing crashing. But, when we complete training: NO ISSUES

    Steps to reproduce

Tried solutions

System configuration

Issue checklist