google-deepmind / alphafold3

AlphaFold 3 inference pipeline.
Other
5.06k stars 563 forks source link

bug: pdb_2022_09_28_mmcif_files.tar replacement with mmcif_files #88

Open hegelab opened 2 days ago

hegelab commented 2 days ago

Hi,

You introduced this into run_alphafold.py, line 185: pdb_2022_09_28_mmcif_files.tar # ~200k PDB mmCIF files in this tar. mmcif_files/ # Directory containing ~200k PDB mmCIF files.

So run fails, since mmcif_files does not exists.

Augustin-Zidek commented 2 days ago

Yes, this was done to significantly speed up template search. You will have to untar your PDB database (the download script has been updated to untar it).

See https://github.com/google-deepmind/alphafold3/blob/main/fetch_databases.sh#L28 for the exact command to run on the tarfile.

charlesbeattie commented 1 day ago

You can pass the following flag to restore the original behaviour:

--pdb_database_path='${DB_DIR}/pdb_2022_09_28_mmcif_files.tar'

This will be considerably slower for each run of alphafold, so I would recommend untaring that file and keep the default.

hegelab commented 1 day ago

Thanks :-)

hegelab commented 1 day ago

It came into my mind:

If the structure search takes a significant amount of time then you may want to add an option not to perform it (I do not see this option; you can replace the template list with an empty list before starting inference, but this is post-processing after completed template search).

In most of my past cases with AF2 I needed to run the prediction without structural template. I suppose that AF3 works without structural templates as well as with templates (similar to AF2).

Augustin-Zidek commented 1 day ago

Yes, you are right, I will add an option to disable template search.

But I am also fixing the template search performance, so should be less of an issue.

kkorotkovuky commented 16 hours ago

Yes, you are right, I will add an option to disable template search.

But I am also fixing the template search performance, so should be less of an issue.

The template-free option will be extremely useful. Also, having the option to limit the template search by the release date - as implemented in AF2 - would be great.

hegelab commented 9 hours ago

May be important: in AF2 the filter-by-date was performed after the template search was completed; I would not perform the search on mmcif entries which can be already excluded based on the date. E.g. if I restricted the templates for later than 2050, the search was performed on all entries and finally none was used, since we were only in 2021.

Augustin-Zidek commented 4 hours ago

Template search should now be much faster (up to ~100x in the mmCIF fetching and parsing stage after Hmmsearch) thanks to https://github.com/google-deepmind/alphafold3/commit/d6b06d6ebcbc94651c0970905b9fdeb48fc45a6a.

Starting work on the template-free and date filter features.