gitter-lab / pharmaco-image

MIT License
1 stars 0 forks source link

Training CNNs on all images #11

Closed agitter closed 1 year ago

agitter commented 5 years ago

We do not know how to train a CNN on all images. One option would be to move all the data to Amazon and use a single multi-GPU instance. However, that would likely be cost prohibitive.

If we are forced to use Cooley or CHTC GPUs, we will need to think about how HTCondor can coordinate the training. HTCondor has a master worker framework that may be relevant. CHTC is offering training on it soon.

This caught my attention because distributed TensorFlow also discusses master-worker organization. There may be many ways to do this in TensorFlow though.

agitter commented 5 years ago

As far as I can tell, the distributed TensorFlow video assumes you have hostnames and ports available for direct communication between machines. That is not possibly in the CHTC pool. Perhaps it would be in AWS or Cooley.