brownbp1 / ncaa_tutorial

Demos for generating non-canonical amino acid parameters and rotamers in Rosetta
MIT License
3 stars 0 forks source link

Cannot use molfile_to_params_polymer_main.py #1

Closed kimdn closed 2 years ago

kimdn commented 2 years ago

Thank you Ben for posting this tutorial.

I read your ppt and it helps my understanding of bcl based ncAA rotamer preparation.

As of now, I am trying to reproduce workshop_2022's tutorial 8 instead of this 'brownbp1/ncaa_tutorial' since files at that workshop are easier for me to follow step by step.

I watched youtube video as well.

I generated dipeptide and conformers according to this workshop's tutorial pdf file.

However, I see Screen Shot 2022-10-04 at 11 59 59 AM when I tried to run

Screen Shot 2022-10-04 at 12 07 15 PM

When I prepared rosetta ncAA rotamer by a classic method, I just ran

py_file="/people/kimd999/bin/rosetta_bin_linux_2021.16.61629_bundle/main/demos/public/using_ncaas_protein_peptide_interface_design/HowToMakeResidueTypeParamFiles/scripts/molfile_to_params_polymer.py"

mol_file="/people/kimd999/script/python/ncaa/prepare_rotlib/1_Add_new_ResidueTypeParam/3_molfile_to_params/1_Prepare_to_run/working_example/A50/output/A50_from_ornithine_from_rosetta_I_STO_appended.mol"

python2.7 $py_file --clobber --polymer --no-pdb --name A50 $mol_file

acccording to classic rosetta document. This old method worked well and I generated a full ncAA rotamer by this classic method eventually.

However, it appears that you used ../scripts/molfile_to_params_polymer/molfile_to_params_polymer_main.py

Can I use molfile_to_params_polymer_main.py as well?

Obviously, only molfile_to_params_polymer_main.py can deal partial_charge, -i while old molfile_to_params_polymer.py can't deal with them.

I can't find molfile_to_params_polymer_main.py in this brownbp1/ncaa_tutorial, workshop_2022's tutorial 8 and rosetta_bin_linux_2021.16.61629_bundle/main

brownbp1 commented 2 years ago

Thanks for the reminder. I did indeed not receive a notification. Sorry about that.

As of now, I am trying to reproduce workshop_2022's tutorial 8 instead of this 'brownbp1/ncaa_tutorial' since files at that workshop are easier for me to follow step by step.

The tutorial is now out of date because of updates from Oanh Vu and Rocco Moretti that were merged into the Rosetta master branch. A graduate student in the Meiler Lab is going through and updating the tutorials, but he is still early in the process. I strongly recommend you follow the steps in 'brownbp1/ncaa_tutorial'. I made it recently, and it is a far more generic and step-by-step introduction to NCAA param generation.

brownbp1 commented 2 years ago

Spoiler - A massive chunk of the API from that workshop (basically tutorials 5-7) will be changing over the next few months because of code refactoring to integrate multiple separate small molecule drug design projects in Rosetta.

kimdn commented 2 years ago

Spoiler - A massive chunk of the API from that workshop (basically tutorials 5-7) will be changing over the next few months because of code refactoring to integrate multiple separate small molecule drug design projects in Rosetta.

Yes, I like this kind of spoiler alert~

For now, I will focus on ncAA part (e.g. tutorial 8) only. I will definitely check out the contents of the rest tutorial later.

brownbp1 commented 2 years ago

For now, I will focus on ncAA part (e.g. tutorial 8) only. I will definitely check out the contents of the rest tutorial later.

Sounds good. Just keep in mind that if the commands are not working, it may be worth looking for an analogous procedure in the brownbp1/ncaa_tutorial set.

Best of luck.

kimdn commented 2 years ago

Thanks for the reminder. I did indeed not receive a notification. Sorry about that.

As of now, I am trying to reproduce workshop_2022's tutorial 8 instead of this 'brownbp1/ncaa_tutorial' since files at that workshop are easier for me to follow step by step.

The tutorial is now out of date because of updates from Oanh Vu and Rocco Moretti that were merged into the Rosetta master branch. A graduate student in the Meiler Lab is going through and updating the tutorials, but he is still early in the process. I strongly recommend you follow the steps in 'brownbp1/ncaa_tutorial'. I made it recently, and it is a far more generic and step-by-step introduction to NCAA param generation.

I spent few months to generate ncAA rotamers with the classic method. But BCL is a game changer!

It appears that you used "/home/brownbp1/bensbin/ncaa_tutorial_rdg/from_github/bcl/rotamer_library/ncaa_base/glycine_neutral.sdf.gz" but I don't have this sdf file.

So here is my understanding,

First of all, prepare dipeptide only.

generate initial 3D conformer

$BCL molecule:ConformerGenerator \
    -explicit_aromaticity \
    -add_h \
    -generate_3D \
    -max_iterations 2000 \
    -top_models 1 \
    -ensemble_filenames $sdf \
    -conformers_single_file ${tag}_3d.sdf

generate dipeptide

$BCL molecule:GenerateRosettaNCAAInstructions \
    -generate_3D \
    -chirality "L_AA" \
    -add_h \
    -neutralize \
    -explicit_aromaticity \
    -input_filenames ${tag}_3d.sdf \
    -output_prefix ${tag}_rosetta \
    -extra_properties ALPHA_AA

Second of all, minimize xxx_rosetta_0.pdb by DFT 6-311G**

(use gaussian, NWChem etc,,,,)

Third of all, generate rosetta instructions again (because NWChem lost RosettaNCAAInstructions)

generate initial 3D conformer

$BCL molecule:ConformerGenerator \
    -explicit_aromaticity \
    -add_h \
    -generate_3D \
    -max_iterations 2000 \
    -top_models 1 \
    -ensemble_filenames $sdf \
    -conformers_single_file ${tag}_3d.sdf

generate rosetta instructions

$BCL molecule:GenerateRosettaNCAAInstructions \
    -generate_3D \
    -chirality "L_AA" \
    -add_h \
    -neutralize \
    -explicit_aromaticity \
    -input_filenames ${tag}_3d.sdf \
    -output_prefix ${tag}_rosetta \
    -extra_properties ALPHA_AA

Fourth, generate conformers

$BCL molecule:ConformerGenerator \
    -explicit_aromaticity \
    -conformation_comparer SymmetryRMSD 0.25 \
    -max_iterations 2000 \
    -top_models 50 \
    -cluster \
    -ensemble_filenames <sdf with rosetta instructions such as M  ROOT>
    -conformers_single_file ${codename}_Rotamer.base.sdf

$BCL molecule:ConformerGenerator \
    -explicit_aromaticity \
    -conformation_comparer SymmetryRMSD 0.125 \
    -max_iterations 500 \
    -top_models 20 \
    -cluster \
    -skip_rotamer_dihedral_sampling \
    -skip_bond_angle_sampling \
    -skip_ring_sampling \
    -ensemble_filenames ${codename}_Rotamer.base.sdf \
    -conformers_single_file ${codename}_Rotamer.sdf

Final, post processing

align to backbone (cosmetic)

$BCL molecule:AlignToScaffold $dipeptide ${codename}_Rotamer.sdf ${codename}_Rotamer.ats.sdf -explicit_aromaticity ; mv ${codename}_Rotamer.ats.sdf ${codename}_Rotamer.sdf

Add the cap atoms instruction to the rotamer file

sed -e "/M END/r ${tag}_rosetta.RosettaInstructions.txt" ${codename}_Rotamer.sdf | awk '!f && /M END/ {f=1;next}1' > ${codename}_Rotamer_rosetta.sdf

Make the pdb files for the molecules and the rotamer lib

rm -f ${codename}_params.log
$PYTHON $ROSETTA --clobber --polymer --all-in-one-pdb --name ${codename} -i ${codename}_Rotamer_rosetta.sdf >> ${codename}_params.log

Add PDB rotamers with full path

echo "PDB_ROTAMERS"readlink -e ${codename}_rotamer.pdb>> ${codename}.params

In this way, I could sample many rotamers

Screen Shot 2022-10-13 at 3 51 01 PM

Screen Shot 2022-10-14 at 11 00 27 AM

and I saw this new ncAA in my FastDesign derived designs.

Screen Shot 2022-10-13 at 4 18 38 PM

brownbp1 commented 2 years ago

bcl/rotamer_library/ncaa_base/glycine_neutral.sdf.gz

Yes you do. It's just not that exact path. It is in your BCL installation, assuming you cloned from BCLCommons GitHub page.

I spent few months to generate ncAA rotamers with the classic method. But BCL is a game changer!

I am glad you like it! If it is useful to folks, I'll try to update it sooner rather than later to include more exotic NCAAs in the automated pipeline. I'm also looking at adding a quick semi-empirical package as an external library, along with an interface, to use for the geometry optimizations. So many projects, so little time.

Best of luck.

kimdn commented 2 years ago

Thanks Ben, I could find glycine_neutral.sdf.gz and use it.

Anyway, I'm afraid that your https://github.com/brownbp1/ncaa_tutorial/blob/master/scripts/quick_parmgen.sh changes DFT optimized input (geometry wise).

Based on sdf files in https://github.com/brownbp1/ncaa_tutorial/tree/master/inputs, I think that input sdf file for quick_parmgen.sh is AA core region itself only (e.g. without dipeptide).

molecule:GenerateRosettaNCAAInstructions will conveniently add dipeptide anyway.

Therefore, I think that my above understanding is not right (e.g. running BCL molecule:GenerateRosettaNCAAInstructions twice is not necessary).

I can't run quick_parmgen with Ace.sdf and NMe.sdf with

=crt=bcl::chemistry=> Molecule contains unsatisfied valences; ConfScore will be very unreliable and conformers will suffer greatly in accuracy
terminate called after throwing an instance of 'std::bad_alloc'

When I ran quick_parmgen with Fmoc-5F-TRP.sdf, it generated lots of rotamers well.

However, the distance between F and C is changed.

For example, if you do distance #3/?:1@C9 #3/?:1@F1 (please change model # of course) at chimeraX with Fmoc-5F-TRP.sdf, you can see that C-F distance is 0.825 A

Screen Shot 2022-10-14 at 4 06 52 PM

However, if you do distance #1/X:1@FT1 #1/X:1@CZ2 at chimeraX with the 1st molecule of Fmoc-5F-TRP_rotamer.pdb (e.g. Fmoc-5F-TRP_rotamer_1.pdb), you can see that C-F distance is changed to 1.346 A

Screen Shot 2022-10-14 at 4 07 48 PM

This change of geometry by quick_parmgen.sh happened to my other ncAA as well.

I think that you designed quick_parmgen.sh to use DFT optimized input geometry sdf file. Then, let's fix quick_parmgen.sh so that DFT optimized input geometry is maintained? or I didn't understand something correctly?

I think that you can quickly (within 2 minutes) verify my concern. However, here I attach used files just in case. Fmoc-5F-TRP.zip

brownbp1 commented 2 years ago

Anyway, I'm afraid that your https://github.com/brownbp1/ncaa_tutorial/blob/master/scripts/quick_parmgen.sh changes DFT optimized input (geometry wise).

Yeah, it definitely does. The quick_parmgen script uses BCL::Conf to generate the 3D structures and build rotamers. The QM optimization is only going to be useful if you either (A) use parent rotamers, or (B) use MakeRotLib. Otherwise, you would need to generate a conformational ensemble and DFT-optimize the whole thing. Again, the BCL thing is a quick protocol.

The quick paramgen script is just to high-throughput quick params for mono-substituted alpha-carbon non-canonical amino acids using a cheminformatics small molecule conformer ensemble generator to build rotamers (https://pubs.acs.org/doi/10.1021/acs.jcim.0c01140 and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4607025/).

brownbp1 commented 2 years ago

Also, from the images you sent, it looks like you are comparing distances from a 2D structure without hydrogen atoms and a 3D structure, which seems odd. Is there a reason for this that I am not understanding?

brownbp1 commented 2 years ago

I can't run quick_parmgen with Ace.sdf and NMe.sdf with

Yeah, these do not fall within the narrow scope of application of the molecule:GenerateRosettaNCAAInstructions application, which currently just works for mono-substituted alpha-carbon non-canonical amino acids. For these or other such NCAAs, you should follow the manual protocol in brownbp1/ncaa_tutorial.

kimdn commented 2 years ago

Anyway, I'm afraid that your https://github.com/brownbp1/ncaa_tutorial/blob/master/scripts/quick_parmgen.sh changes DFT optimized input (geometry wise).

Yeah, it definitely does. The quick_parmgen script uses BCL::Conf to generate the 3D structures and build rotamers. The QM optimization is only going to be useful if you either (A) use parent rotamers, or (B) use MakeRotLib. Otherwise, you would need to generate a conformational ensemble and DFT-optimize the whole thing. Again, the BCL thing is a quick protocol.

The quick paramgen script is just to high-throughput quick params for mono-substituted alpha-carbon non-canonical amino acids using a cheminformatics small molecule conformer ensemble generator to build rotamers (https://pubs.acs.org/doi/10.1021/acs.jcim.0c01140 and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4607025/).

Yes, I think that you wrote somewhere that using a parent rotamer is useful (although quick) only for certain limited cases. I ran MakeRotLib for other ncAA rotamer generation once, but I had to spend too much time with many manual processes.

Therefore. I prefer to optimize quick_parmgen_before_NWChem derived rotamer pdb file (e.g. A51_rotamer.pdb) with QM method. I know this sounds too expensive. However, I will use/optimize only 100 random chosen rotamers from this pdb file (from 762 rotamers). Small molecule conformer generation is Screen Shot 2022-10-18 at 1 54 06 PM anyway.

Additionally, with NWChem, DFT 6-311G** based geometry optimization took 1~13 hr (9 hrs for phenylalanine, 13 hrs for tryptophan) per each dipeptide (e.g. 1 ncAA core + N-term + C-term). Therefore, I think that this is manageable.

kimdn commented 2 years ago

Also, from the images you sent, it looks like you are comparing distances from a 2D structure without hydrogen atoms and a 3D structure, which seems odd. Is there a reason for this that I am not understanding?

Yes, that may seem odd. However, since what I care is a distance between Carbon and fluorine, I think that this is fine. When I use your Fmoc-5F-TRP.sdf, I think that add_h option in ConformerGenerator/GenerateRosettaNCAAInstructions adds hydrogens. Of course, for final rotamer preparation (e.g. DFT optimization after quick_parmgen), I will use hydrogen added one (e.g. A51_rotamer.pdb Screen Shot 2022-10-18 at 2 01 10 PM ).

kimdn commented 2 years ago

I can't run quick_parmgen with Ace.sdf and NMe.sdf with

Yeah, these do not fall within the narrow scope of application of the molecule:GenerateRosettaNCAAInstructions application, which currently just works for mono-substituted alpha-carbon non-canonical amino acids. For these or other such NCAAs, you should follow the manual protocol in brownbp1/ncaa_tutorial.

hm,, I can't figure out how to run manual protocol in brownbp1/ncaa_tutorial... even after I look at its files...