Andre-lab / evodock

A Memetic Algorithm boosts accuracy and speed of all-atom protein-protein docking
24 stars 8 forks source link

local_assembly ?? #4

Open pippo1990 opened 3 months ago

pippo1990 commented 3 months ago

I am trying to figure out how Evodock works.

Could be that I am mistaken and didnt get the real pourpose of the tool. I am trying to use:

Local assembly - Predicting assembly structure from multiple backbones and starting positions:

python ./evodock.py configs/symmetric/local_assembly.ini

In the code example using 1STM I get

[Inputs] subunits=inputs/subunits/1STM/ symdef_file=inputs/test_symmetry_files/1STM.symm

but still is not clear to me what subunits are.

Larger capsids with more than 60 subunits can be formed by associating protomers into asymmetric units where each protein chain adopts a slightly different conformation (T>1). The asymmetric unit can also consist of several different types of protein chains, forming heteromeric rather than homomeric protein capsids

Are subunits the protomers described above ? Or they are just a list of reults of alphafold prediction of a T=1 icosaedhral capsid ?

whta is 1STM.symm ?? Is it something I can deriive in some way from the PDB file containg the BIOMT or assembly 1 symmetry ? Or is one of the simmetry files in https://github.com/Andre-lab/evodock/tree/main/inputs/symmetry_files ?

Thanks for your help I tried to follow "Accurate prediction of protein assembly structure by combining AlphaFold and symmetrical docking" but wasnt able to understand a lot of it. I just thought that evodock could be of help to me trying to get evodock to create icosahedral particles out of mutated and submitted to alphafold online free servers phage capsid asymmetric units

pippo1990 commented 3 months ago

mmmh think I got it wrong .. from Accurate prediction of protein assembly structure by combining AlphaFold and symmetrical docking

Selection of benchmark structures The overall selection process for the cubic structure benchmark is described in Fig. 6. First a list of homomeric tetrahedral, octahedral, and icosahedral assemblies with 12, 24, and 60 chains, respectively, and with a resolution better than 4 Å was compiled from the Protein Data Bank34

so I guess the tools work only for homomeric capsids

pippo1990 commented 3 months ago

A last question:

in 1STM.symm I can read : ..... symmetry_name /home/shared/databases/SYMMETRICAL/I/unrelaxed/native/../../idealized/symdef/native/1STM.symm ....

Any chance I could build this file myself starting from 1STM.cif . ? (i.e command line / or script)

I installed your packege so I think I 've have PyRosetta 4 installed too

think I got it

python scripts/cubic_to_rosetta.py --structures tests/inputs/1stm.cif --symmetry I --symdef_outpath tests/outputs/ --input_outpath tests/outputs/ --rosetta_repr 1 --rosetta_repr_outpath tests/outputs/ --overwrite

https://github.com/Andre-lab/cubicsym

MadsJeppesen commented 3 months ago

Hi pippo1990,

subunits in the [Inputs] flag refers to a directory containing all pdb files (with different backbones) you wish to use in EvoDOCK. T>1 and heteromeric structures are untested but should be doable by combining the individual chains in a subunit file into a single chain. This trickers EvoDOCK to think it is a continuous chain and is T=1 and it will not move the constituent chains individually.

It seems like you found the answer on how to generate a symmetry file (*.symm) as 1STM.symm is. As you mentioned you have to use an external library (cubicsym) to generate the symmetry file.

pippo1990 commented 2 months ago

any chance to get a score based on the radius of the final icosahedral particle ? I mean I tried some simulation and some of them returns icosahedral particles where each subunits group is far apart from the others , I guess even if they score better are irrealistic results

pippo1990 commented 2 months ago

Hi pippo1990,

subunits in the [Inputs] flag refers to a directory containing all pdb files (with different backbones) you wish to use in EvoDOCK. T>1 and heteromeric structures are untested but should be doable by combining the individual chains in a subunit file into a single chain. This trickers EvoDOCK to think it is a continuous chain and is T=1 and it will not move the constituent chains individually.

It seems like you found the answer on how to generate a symmetry file (*.symm) as 1STM.symm is. As you mentioned you have to use an external library (cubicsym) to generate the symmetry file.

yep went that way but I can create of course a monomer faking the AU of 2ms2 phage (all 3 chains as A) but now I am at loss trying to get a symm file of it

I mean its not hard to change the chains name all to As, but how to generate a meaningfull _pdbx_struct_oper_list (https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/pdbx_struct_oper_list.html) loop_ for the .cif file to be feed to cubicsymm ?

pippo1990 commented 2 months ago

Hi pippo1990, subunits in the [Inputs] flag refers to a directory containing all pdb files (with different backbones) you wish to use in EvoDOCK. T>1 and heteromeric structures are untested but should be doable by combining the individual chains in a subunit file into a single chain. This trickers EvoDOCK to think it is a continuous chain and is T=1 and it will not move the constituent chains individually. It seems like you found the answer on how to generate a symmetry file (*.symm) as 1STM.symm is. As you mentioned you have to use an external library (cubicsym) to generate the symmetry file.

yep went that way but I can create of course a monomer faking the AU of 2ms2 phage (all 3 chains as A) but now I am at loss trying to get a symm file of it

I mean its not hard to change the chains name all to As, but how to generate a meaningfull _pdbx_struct_oper_list (https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/pdbx_struct_oper_list.html) loop_ for the .cif file to be feed to cubicsymm ?

Just copied and pasted original one the script (cubicsymm) completed with no error and got symm file , I'll test the generate full assembly too to see if I get the entire capsid correct then I guess I'll try evodeock with that .symm file

MadsJeppesen commented 2 months ago

It should be possible to score with a radius term inside EvoDOCK and it makes sense if you have confidence on the approximate radius. However, I have seen scattered subunits as well in my own simulations and they are a results of EvoDOCK not being able to find low energy interfaces in certain trajectories. Im confident if you run more simulations you would see better packed subunits were the radius also make sense. How many simulations have you run? You should likely run 10-50 runs to get reasonable results depending on your system. Let me know how it goes with the created symmetry file and I can assist you further if you run into any issues.

pippo1990 commented 2 months ago

It should be possible to score with a radius term inside EvoDOCK and it makes sense if you have confidence on the approximate radius. However, I have seen scattered subunits as well in my own simulations and they are a results of EvoDOCK not being able to find low energy interfaces in certain trajectories. Im confident if you run more simulations you would see better packed subunits were the radius also make sense. How many simulations have you run? You should likely run 10-50 runs to get reasonable results depending on your system. Let me know how it goes with the created symmetry file and I can assist you further if you run into any issues.

I just load the results and use Biopython to calculate the radius starting from Entity.center_of_mass(geometric=True) [https://biopython.org/docs/dev/api/Bio.PDB.Entity.html#Bio.PDB.Entity.Entity.center_of_mass] and getting the most distant residue/atom with Euclidean distance kind of very long and slow calculation, but works.

I have very limited computational resources so it take ages to get just 5 .cif results I am doing it as a hobby. Kind of got nice results for 1STM (none of which matches the original structure) but I am failing to get nice result using 2ms2 as template:

Just changing all 3 AU subunits A,B,C to chain A of a modified (some of the loops where deleted) 2ms2.cif and calculate .symm adding the loop_ pdbx_struct_oper_list to my pymol all A 2ms2 I am failing to get plausible results, Dont understand a lot about symmetry and so on but its like the results obtained try to get an icosahedral like particle out of the AU without dimerizing the subunits. In 2ms2 you get A,B,C monomers in AU that then create an Icosahedron of AB , CC dimers (60 A,B,C for 180 total monomers) this should be in the symmetri matrix I collate to the modded all A cif , I am not sure of my .ini file , maybe I just miss some of the needed paramteters fur such kind of capsids [sorry to take your time , hope it is clear what I am trying to accomplish: get 2ms2.cif --> delete some bits of each monomer (all the 3 monomer have same deletion) --> trying to recreate the final capsid ; basic idea is to trying to prove you can modify each one of the 3 monomers in some way to incorporate 3 different tags , next move will be to modify each of the monomer in a different way, I know is far fetched because of the way such capsids assemble (first they dimerize creating nucleation points and so on ) but I was fantasizing about mixing the 3 kinds of monomer toghether and some how rebuild the 3 tags decorated entire capsid]

Captureexp

the inner one is the original 2ms2.cif capsid (https://www.rcsb.org/structure/2ms2) with A;B dimers as cyan, green and C;C as pink, the two outer blue and deep-purple are 2 of the output results for the 2 loop deleted all A chains feed to evodock , I am failing to get the software to reassemble the interfaces of the A,B and C,C monomers.

changing the [Bounds] , bounds parameters I can go down to :

Captureexp3

but still dont get the dimerization.

I would look for another approach like energy/stteric minimization of the modded 2ms2 obtained by copying the original symmetry .cif loops with the modded chains something like Chiron [https://dokhlab.med.psu.edu/chiron/documentation.php] unfortunately I think it still cannot handle .cif files and big capsid (even if Ms2 is not that big)

pippo1990 commented 2 months ago

more on this I wanted to test the :

[Docking]
type=GlobalFromMultimer

in evodock .init feeding the software with the 3 separate monomers of 2ms2.cif as subunits

[Inputs]
subunits=./2ms2_A_B_C_monomers/

and using the .symm file obtained as described above , same results :

Captureexp2

then I realized perhaps I need a different .symm file so I turned to cubicsym (https://github.com/Andre-lab/cubicsym)

to generate it (as done for the above ****) with:

python ./cubicsym/scripts/cubic_to_rosetta.py --structures ./2ms2.cif --symmetry I --output_generated_structure --output_generated_structure_outpath 2ms2_cubicsymm_test/outputs --overwrite **

** here to get the structure , to get .symm --> --symdef_outpath tests/outputs/

using the original 2ms2.cif as input but getting :

a lot of No symmetry dectected

and an error that stop the script :

raise NotImplemented(f"Code does not yet support ')(' combinations of operations such as {symmetry_operations}")
TypeError: 'NotImplementedType' object is not callable

I'll try to add --hf1 --hf2 ..... to see if I can progeress more

SOLVED using:

--model_together True

python ./cubicsym/scripts/cubic_to_rosetta.py --structures ./22768/2ms2.cif --symmetry I --model_together True --symdef_outpath 2ms2_cubicsymm_test/outputs --overwrite

and :

python ./cubicsym/scripts/cubic_to_rosetta.py --structures 22768/2ms2.cif --symmetry I --model_together True --output_generated_structure --output_generated_structure_outpath 2ms2_cubicsymm_test/outputs --overwrite

**** cubicsym worked and generated a .symm file feeding it with a modded 2ms2.cif file where all the chains where changed to A and adding the #loop_ #pdbx_struct_oper_list from original 2ms2.cif

EDITED :

maybe GlobalFromMultimer means that the programs uses all the subunits one at the time and scores between all of them

once again I missed the

Selection of benchmark structures The overall selection process for the cubic structure benchmark is described in Fig. 6. First a list of homomeric tetrahedral, octahedral, and icosahedral assemblies with 12, 24, and 60 chains, respectively, and with a resolution better than 4 Å was compiled from the Protein Data Bank34

First a list of homomeric tetrahedral, octahedral, and icosahedral assemblies with

MadsJeppesen commented 2 months ago

Thank you for describing your approach. It makes sense to recereate the extract symmetry from 2MS2 with the symmetry script in cubicsym with the the goal you are trying to accomplish. Also to use a higher T number than 1, you would have to combine the chains as you have done. For local refinement I would use:

[Docking]
type=Local

Instead.

You can check if it makes sense fast with the following options:

[DE]
popsize=4
maxiter=1

It should not move a lot in that case. Let me know if you have success. If the issue of having subunits out in space persist can you upload all your input files? In that case I can check what is going on in detail.

pippo1990 commented 2 months ago

as per in my previous post https://github.com/Andre-lab/evodock/issues/4#issuecomment-2325024980 my best results is :

Captureexp3

but still , not gettint the dimerization of the original 2ms2.cif :

2ms2

got it with ( local_recapitulation_mine_2ms2.ini.txt ) :


[Docking]
type=Local

[Inputs]
single=./2ms2_modded_chain_ABC_as_A_segi_too_SUPERPOSED_prepack.pdb
symdef_file=./2ms2_all_A_modded_sym_added.symm

[Outputs]
output_path=tests/outputs/local_recapitulation/
output_pdb=True

[Bounds]
bounds=5,18,5,40,40,40
allow_flip=True

[DE]
scheme=RANDOM
popsize=4
mutate=0.1
recombination=0.7
maxiter=3
local_search=symshapedock
slide=true
selection=interface

[RosettaOptions]
initialize_rigid_body_dofs=true

and the following .symm and .pdb input files :

2ms2_all_A_modded_sym_added.symm.txt 2ms2_modded_chain_ABC_as_A_segi_too_SUPERPOSED_prepack..pdb.txt