google-deepmind / alphafold3

AlphaFold 3 inference pipeline.
Other
5.07k stars 563 forks source link

Compilation time of the model #105

Closed smg3d closed 4 hours ago

smg3d commented 6 hours ago

In the Performance documentation:Compilation Buckets it is mention that we want to preferably use a single compilation of the model.

Question 1: Does the time for the compilation of the model correspond to most of the difference in inference time between the first seed and the other seeds. In the example below, does the compilation take about 22 sec?

Time in seconds Step Featurising Inference Extracting
firstseed 4.22 76.54 0.50
other_seeds 4.17 54.19 0.30
difference 0.05 22.35 0.20

Question 2 : Is the compilation of the model done on GPU or CPU?

jsspencer commented 4 hours ago
  1. Yes, the time difference is compilation (see, for example, https://jax.readthedocs.io/en/latest/profiling.html). You can use a persistent compilation cache (https://jax.readthedocs.io/en/latest/persistent_compilation_cache.html), though jax will still need to trace, so some overhead remains. 22s sounds reasonable.

  2. XLA can do autotuning, which involves running kernels on GPU (note we disable one such pass in https://github.com/google-deepmind/alphafold3/blob/main/docker/Dockerfile#L59; I haven't checked if there are other autotuning passes triggered though). Compilation is otherwise done on CPU.