YoshitakaMo / localcolabfold

ColabFold on your local PC
MIT License
563 stars 129 forks source link

Colabfold template selection #249

Open GISTAL opened 1 month ago

GISTAL commented 1 month ago

Hello everyone,

I am facing the following situation: I am trying to run colabfold locally and for that I have chosen to disable the MSA search and use a large self-created template folder with 792 templates (x002-x793). During the run, I see that only a portion (about 100) of the templates have been used (see below). I tried the same thing with another protein sequence and different template folder (892 templates), but again only 50% of the templates were charged.

My questions: 1) What determines the template selection? sequence length? sequence similarity between the protein sequence and/or within templates? Can it be adjusted so all templates are implemented? 2) I disabled the MSA search as it makes the contribution of templates negligible, correct?

Run colabfold_batch --amber --templates --custom-template-path TempGISTAL --recycle-early-stop-tolerance 0.1 --num-recycle 20 --num-models 3 --rank plddt --use-gpu-relax TARGETGISTAL.a3m outputdir_TARGETGISTALCF/ &

log file 2024-08-21 20:57:07,763 Running colabfold 1.5.5 (fdf3b235b88746681c46ea12bcded76ecf8e1f76) 2024-08-21 20:57:07,925 Running on GPU 2024-08-21 20:57:09,214 Found 9 citations for tools or databases 2024-08-21 20:59:36,289 Query 1/1: TARGET (length 653) 2024-08-21 21:00:20,122 Sequence 0 found templates: ['x481_A', 'x481_B', 'x492_A', 'x492_B', 'x461_A', 'x461_B', 'x476_A', 'x476_B', 'x514_A', 'x514_B', 'x508_A', 'x508_B', 'x474_A', 'x474_B', 'x478_A', 'x478_B', 'x505_A', 'x505_B', 'x191_A', 'x191_B', 'x193_A', 'x193_B', 'x176_A', 'x176_B', 'x216_A', 'x216_B', 'x175_A', 'x175_B', 'x213_A', 'x213_B', 'x206_A', 'x206_B', 'x205_A', 'x205_B', 'x177_A', 'x177_B', 'x198_A', 'x198_B', 'x221_A', 'x221_B', 'x201_A', 'x201_B', 'x641_A', 'x641_B', 'x640_A', 'x640_B', 'x590_A', 'x590_B', 'x638_A', 'x638_B', 'x601_A', 'x601_B', 'x612_A', 'x612_B', 'x598_A', 'x598_B', 'x642_A', 'x642_B', 'x595_A', 'x595_B', 'x608_A', 'x608_B', 'x622_A', 'x622_B', 'x624_A', 'x624_B', 'x633_A', 'x633_B', 'x626_A', 'x626_B', 'x637_A', 'x637_B', 'x736_A', 'x736_B', 'x758_A', 'x758_B', 'x756_A', 'x756_B', 'x752_A', 'x752_B', 'x786_A', 'x786_B', 'x740_A', 'x740_B', 'x791_A', 'x791_B', 'x768_A', 'x768_B', 'x781_A', 'x781_B', 'x734_A', 'x734_B', 'x742_A', 'x742_B', 'x743_A', 'x743_B', 'x779_A', 'x779_B', 'x277_A', 'x277_B', 'x279_A', 'x279_B', 'x254_A', 'x254_B', 'x275_A', 'x275_B', 'x238_A', 'x238_B', 'x248_A', 'x248_B'] 2024-08-21 21:01:03,640 Sequence 1 found templates: ['x481_A', 'x481_B', 'x492_A', 'x492_B', 'x461_A', 'x461_B', 'x476_A', 'x476_B', 'x514_A', 'x514_B', 'x508_A', 'x508_B', 'x474_A', 'x474_B', 'x478_A', 'x478_B', 'x505_A', 'x505_B', 'x191_A', 'x191_B', 'x193_A', 'x193_B', 'x176_A', 'x176_B', 'x216_A', 'x216_B', 'x175_A', 'x175_B', 'x213_A', 'x213_B', 'x206_A', 'x206_B', 'x205_A', 'x205_B', 'x177_A', 'x177_B', 'x198_A', 'x198_B', 'x221_A', 'x221_B', 'x201_A', 'x201_B', 'x641_A', 'x641_B', 'x640_A', 'x640_B', 'x590_A', 'x590_B', 'x638_A', 'x638_B', 'x601_A', 'x601_B', 'x612_A', 'x612_B', 'x598_A', 'x598_B', 'x642_A', 'x642_B', 'x595_A', 'x595_B', 'x608_A', 'x608_B', 'x622_A', 'x622_B', 'x624_A', 'x624_B', 'x633_A', 'x633_B', 'x626_A', 'x626_B', 'x637_A', 'x637_B', 'x736_A', 'x736_B', 'x758_A', 'x758_B', 'x756_A', 'x756_B', 'x752_A', 'x752_B', 'x786_A', 'x786_B', 'x740_A', 'x740_B', 'x791_A', 'x791_B', 'x768_A', 'x768_B', 'x781_A', 'x781_B', 'x734_A', 'x734_B', 'x742_A', 'x742_B', 'x743_A', 'x743_B', 'x779_A', 'x779_B', 'x277_A', 'x277_B', 'x279_A', 'x279_B', 'x254_A', 'x254_B', 'x275_A', 'x275_B', 'x238_A', 'x238_B', 'x248_A', 'x248_B'] 2024-08-21 21:01:04,137 Setting max_seq=7, max_extra_seq=1 2024-08-21 21:02:09,765 alphafold2_multimer_v3_model_1_seed_000 recycle=0 pLDDT=69.1 pTM=0.377 ipTM=0.368 ... 2024-08-21 21:06:56,453 alphafold2_multimer_v3_model_1_seed_000 recycle=20 pLDDT=69 pTM=0.4 ipTM=0.391 tol=1.55 2024-08-21 21:06:56,454 alphafold2_multimer_v3_model_1_seed_000 took 346.6s (20 recycles) 2024-08-21 21:07:12,333 alphafold2_multimer_v3_model_2_seed_000 recycle=0 pLDDT=66.8 pTM=0.381 ipTM=0.376 ... 2024-08-21 21:12:15,761 alphafold2_multimer_v3_model_2_seed_000 recycle=20 pLDDT=73.1 pTM=0.381 ipTM=0.377 tol=0.186 2024-08-21 21:12:15,761 alphafold2_multimer_v3_model_2_seed_000 took 318.7s (20 recycles) 2024-08-21 21:12:31,470 alphafold2_multimer_v3_model_3_seed_000 recycle=0 pLDDT=57.4 pTM=0.391 ipTM=0.377 ... 2024-08-21 21:17:23,735 alphafold2_multimer_v3_model_3_seed_000 recycle=20 pLDDT=58.7 pTM=0.366 ipTM=0.36 tol=0.626 2024-08-21 21:17:23,736 alphafold2_multimer_v3_model_3_seed_000 took 307.3s (20 recycles) 2024-08-21 21:17:24,306 reranking models by 'plddt' metric 2024-08-21 21:17:25,555 Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead. 2024-08-21 21:17:45,193 Relaxation took 20.9s 2024-08-21 21:17:45,194 rank_001_alphafold2_multimer_v3_model_2_seed_000 pLDDT=73.1 pTM=0.381 ipTM=0.377 2024-08-21 21:18:05,214 Relaxation took 20.0s 2024-08-21 21:18:05,214 rank_002_alphafold2_multimer_v3_model_1_seed_000 pLDDT=69 pTM=0.4 ipTM=0.391 2024-08-21 21:18:29,338 Relaxation took 24.0s 2024-08-21 21:18:29,339 rank_003_alphafold2_multimer_v3_model_3_seed_000 pLDDT=58.7 pTM=0.366 ipTM=0.36 2024-08-21 21:18:30,773 Done

Kind regards, GISTAL.