KosinskiLab / AlphaPulldown

https://doi.org/10.1093/bioinformatics/btac749
GNU General Public License v3.0
199 stars 46 forks source link

Multiple mmts #328

Closed DimaMolod closed 4 months ago

DimaMolod commented 5 months ago

Here I added a boolean flag --multiple_mmts that clusters multimeric templates by protein name if true and creates individual feature for each line in description.csv if false. @dingquanyu please check that I didn't break any mmseqs2 logic

dingquanyu commented 5 months ago

@DimaMolod the current version of the files won't crash if mmseqs2 mode and mmt mode are both turned on but it won't substitute template features in the pickles from mmseqs2 results. Currently all template features are calculated using mmseqs2 databases and alignment algorithms.

If you want to make mmseqs2 mode compatible with mmt mode, I'm afraid you have to look into the data structure of the databases used by colabfold and write a function to create a fake template db with the same data structure and so on. I won't suggest you investing your time now on this. Let's modify this PR so that when both mmseqs2 and mmt modes are on, log a warning that mmt won't work for now. https://github.com/KosinskiLab/AlphaPulldown/blob/ab7b926feb5d3e585e6d7d7051255957c88e4eed/alphapulldown/objects.py#L212C13-L212C33

jkosinski commented 5 months ago

@DimaMolod the current version of the files won't crash if mmseqs2 mode and mmt mode are both turned on but it won't substitute template features in the pickles from mmseqs2 results. Currently all template features are calculated using mmseqs2 databases and alignment algorithms.

If you want to make mmseqs2 mode compatible with mmt mode, I'm afraid you have to look into the data structure of the databases used by colabfold and write a function to create a fake template db with the same data structure and so on. I won't suggest you investing your time now on this. Let's modify this PR so that when both mmseqs2 and mmt modes are on, log a warning that mmt won't work for now. https://github.com/KosinskiLab/AlphaPulldown/blob/ab7b926feb5d3e585e6d7d7051255957c88e4eed/alphapulldown/objects.py#L212C13-L212C33

If that is hard/time consuming to code and test, indeed maybe for now throw an error if mmseqs is attempted with multimeric templates and put an issue here to add this compatibility later ("for_later" label)

DimaMolod commented 4 months ago

@dingquanyu @jkosinski please check the last commit and approve if it looks good enough now. We still import flags from run_alphafold.py in create_individual_features.py. Eventually, we need to do the same as for run_structure_prediction.py and copy all the flags manually.