LLNL / lbann

Livermore Big Artificial Neural Network Toolkit
http://software.llnl.gov/lbann/
Other
221 stars 80 forks source link

LBANN: Spatial Parallelism #1552

Open dl-guey opened 4 years ago

dl-guey commented 4 years ago

I would like to evaluate a model-parallel application using the spatial parallelism support described in the LBANN publication: (Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism).

However, I don't see any instructions or code snippets here for running LBANN with spatial parallelism.

Is spatial parallelism currently open-source and on this repo? Are there any instructions or documents for enabling and running spatial parallelism with LBANN?

naoyam commented 4 years ago

Documentation remains to be worked on, but pretty much every code for spatial parallelism is publicly available. The main component of parallel convolutions exists in a separate library, DiHydrogen, which is used from LBANN when enabled. DiHydrogen is available at https://github.com/LLNL/DiHydrogen (see the legacy directory).

We don't have specific documentations for spatial parallelism yet, however, once you successfully build and run LBANN, additional steps for using spatial parallelism are minor.

Do you have specific models with which you want to try spatial parallelism? If so, the first step would be to run the model on LBANN without spatial parallelism.