experiencor / keras-yolo3

Training and Detecting Objects with YOLO3
MIT License
1.6k stars 861 forks source link

How to run Keras-yolo3 in multi-node multi GPU environment? #133

Open rupeshcash opened 6 years ago

rupeshcash commented 6 years ago

The code is able to run on 2 gpus on a single node (on hpc system). But when i try to run it on multiple nodes (with 2gpus on each node), the code is still detecting only one node with 2 gpus. Any idea, how do i make it run on each node's gpus?

To run on multiple gpus i have done only this stupid thing in the config file: since 2 nodes * 2 gpus per node =4 gpus. So in the config.json file i have:

"gpus": "0,1,2,3",

Its only detecting the 0 & 1 (first node) correctly and works well but not detecting 2 & 3 i.e the second node.

khelina commented 5 years ago

Hi rupeshcash, I wonder if you have sorted out this problem. I have access to similar HPC (with 2 gpus on each node), so to make my CNN run on 4 gpus, I have to specify number of nodes=2. in my resources request And it does detect all the 4 gpus. But my problem is that Keas outputs two different outcomes (losses) from two different nodes separately. Any idea how to merge them?