aqlaboratory / openfold

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
Apache License 2.0
2.82k stars 547 forks source link

[pl_upgrades] OpenFold not configured to support compute capability 9.0 #490

Open reedharrison opened 1 month ago

reedharrison commented 1 month ago

I have recently been trying to run OpenFold on NVIDIA H100 cards; however, I ran into problems from the main and pl_upgrades branches that prevented OpenFold from running.

The first issue is that I noticed that "setup.py" from the pl_upgrades branch is not configured to support compute capability 9.0 (the minimum compute capability required by H100 cards). This problem, seems to be an easy fix - just edit "setup.py" to include compute capability 9.0 (example).

The second issue is that I had some issues building a conda/mamba environment that was compatible with OpenFold. I experienced some bugs with pytorch version 2-2.2. It seemed one of the bugs in particular was not fixed until pytorch 2.4.1, so I had to identify an environment compatible with this version of pytorch. If interested, you can see an example of the "environment.yml" that I ended up using here. While this environment works well for me, I didn't do exhaustive testing of all options in the default OpenFold run script. I use a custom run script that doesn't use the new deepspeed evoformer attention option, for example.

ulupo commented 3 weeks ago

As for your second issue, would using the modified environment.yml from https://github.com/aqlaboratory/openfold/pull/496 help? See also https://github.com/aqlaboratory/openfold/issues/494