Question about multi GPU inference

linuxfold commented 1 week ago

Is there any information on multi GPU inference? Does it just work automatically?

I see something about 8 40GB A100s being used for a single prediction in the docs.

Does NVLink matter?

Augustin-Zidek commented 1 week ago

Hello,

we have validated and support only configurations with a single GPU (A100 or H100). More details in https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md.

Unless you need to run AlphaFold 3 on inputs with more than 5,120 tokens, running on a single GPU should be beneficial, since sharding across multiple GPUs adds some overhead.

I see something about 8 40GB A100s being used for a single prediction in the docs.

No, this is in comparison against the AlphaFold 3 paper setup, where we ran on 16 NVIDIA A100s, with 40 GB of memory per device. In contrast, this repository supports running AlphaFold 3 on a single NVIDIA A100 with 80 GB of memory in a configuration optimised to maximise throughput.

Does NVLink matter?

Not really, since AlphaFold will run only on a single GPU.

linuxfold commented 1 week ago

Thank you for the quick response!

Am I understanding correctly that it is possible to use multiple GPUs but that the current repo is not set up for it?

I understand it is not officially supported, but I am interested potentially in trying to NVLink 2 3090s. I just want to know what is possible, even if not optimal, given that A100s are very expensive.

Augustin-Zidek commented 1 week ago

Am I understanding correctly that it is possible to use multiple GPUs but that the current repo is not set up for it?

Yes. In principle it is possible, but not without substantial code changes. See e.g. https://jax.readthedocs.io/en/latest/multi_process.html for more details.

I understand it is not officially supported, but I am interested potentially in trying to NVLink 2 3090s. I just want to know what is possible, even if not optimal, given that A100s are very expensive.

If you are folding small inputs, you won't even need two of them. If you need to fold larger inputs that don't fit in the 3090 RAM, I would recommend turning on unified memory that allows spilling the GPU RAM into the host RAM. It will run slower because of all of the extra memory moves, but won't require any code changes. See https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#unified-memory.

linuxfold commented 1 week ago

Thanks again! This is very helpful.

Is there any chance the other code version might be released at some point?

I am sitting on a bunch of 3090s (up to 6-7 in one machine) that might benefit from being parallelized to maybe approach the performance one A100 80GB... Otherwise they might not have as much use anymore.

eunos-1128 commented 1 week ago

I have the same problem with memory size.

It would be helpful if the multi-GPU implementation that could solve this problem is released.

Augustin-Zidek commented 1 week ago

Is there any chance the other code version might be released at some point?

Very unlikely, sorry.

I am sitting on a bunch of 3090s (up to 6-7 in one machine) that might benefit from being parallelized to maybe approach the performance one A100 80GB... Otherwise they might not have as much use anymore.

You should be able to use them by folding 6-7 proteins at once. :) This will in fact lead to higher throughput than if you used sharding -- as illustrated by the table in https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#model-inference.

google-deepmind / alphafold3

Question about multi GPU inference #31