Closed kimdn closed 2 years ago
Thanks for the reminder. I did indeed not receive a notification. Sorry about that.
As of now, I am trying to reproduce workshop_2022's tutorial 8 instead of this 'brownbp1/ncaa_tutorial' since files at that workshop are easier for me to follow step by step.
The tutorial is now out of date because of updates from Oanh Vu and Rocco Moretti that were merged into the Rosetta master branch. A graduate student in the Meiler Lab is going through and updating the tutorials, but he is still early in the process. I strongly recommend you follow the steps in 'brownbp1/ncaa_tutorial'. I made it recently, and it is a far more generic and step-by-step introduction to NCAA param generation.
Spoiler - A massive chunk of the API from that workshop (basically tutorials 5-7) will be changing over the next few months because of code refactoring to integrate multiple separate small molecule drug design projects in Rosetta.
Spoiler - A massive chunk of the API from that workshop (basically tutorials 5-7) will be changing over the next few months because of code refactoring to integrate multiple separate small molecule drug design projects in Rosetta.
Yes, I like this kind of spoiler alert~
For now, I will focus on ncAA part (e.g. tutorial 8) only. I will definitely check out the contents of the rest tutorial later.
For now, I will focus on ncAA part (e.g. tutorial 8) only. I will definitely check out the contents of the rest tutorial later.
Sounds good. Just keep in mind that if the commands are not working, it may be worth looking for an analogous procedure in the brownbp1/ncaa_tutorial set.
Best of luck.
Thanks for the reminder. I did indeed not receive a notification. Sorry about that.
As of now, I am trying to reproduce workshop_2022's tutorial 8 instead of this 'brownbp1/ncaa_tutorial' since files at that workshop are easier for me to follow step by step.
The tutorial is now out of date because of updates from Oanh Vu and Rocco Moretti that were merged into the Rosetta master branch. A graduate student in the Meiler Lab is going through and updating the tutorials, but he is still early in the process. I strongly recommend you follow the steps in 'brownbp1/ncaa_tutorial'. I made it recently, and it is a far more generic and step-by-step introduction to NCAA param generation.
I spent few months to generate ncAA rotamers with the classic method. But BCL is a game changer!
It appears that you used "/home/brownbp1/bensbin/ncaa_tutorial_rdg/from_github/bcl/rotamer_library/ncaa_base/glycine_neutral.sdf.gz" but I don't have this sdf file.
So here is my understanding,
$BCL molecule:ConformerGenerator \
-explicit_aromaticity \
-add_h \
-generate_3D \
-max_iterations 2000 \
-top_models 1 \
-ensemble_filenames $sdf \
-conformers_single_file ${tag}_3d.sdf
$BCL molecule:GenerateRosettaNCAAInstructions \
-generate_3D \
-chirality "L_AA" \
-add_h \
-neutralize \
-explicit_aromaticity \
-input_filenames ${tag}_3d.sdf \
-output_prefix ${tag}_rosetta \
-extra_properties ALPHA_AA
(use gaussian, NWChem etc,,,,)
$BCL molecule:ConformerGenerator \
-explicit_aromaticity \
-add_h \
-generate_3D \
-max_iterations 2000 \
-top_models 1 \
-ensemble_filenames $sdf \
-conformers_single_file ${tag}_3d.sdf
$BCL molecule:GenerateRosettaNCAAInstructions \
-generate_3D \
-chirality "L_AA" \
-add_h \
-neutralize \
-explicit_aromaticity \
-input_filenames ${tag}_3d.sdf \
-output_prefix ${tag}_rosetta \
-extra_properties ALPHA_AA
$BCL molecule:ConformerGenerator \
-explicit_aromaticity \
-conformation_comparer SymmetryRMSD 0.25 \
-max_iterations 2000 \
-top_models 50 \
-cluster \
-ensemble_filenames <sdf with rosetta instructions such as M ROOT>
-conformers_single_file ${codename}_Rotamer.base.sdf
$BCL molecule:ConformerGenerator \
-explicit_aromaticity \
-conformation_comparer SymmetryRMSD 0.125 \
-max_iterations 500 \
-top_models 20 \
-cluster \
-skip_rotamer_dihedral_sampling \
-skip_bond_angle_sampling \
-skip_ring_sampling \
-ensemble_filenames ${codename}_Rotamer.base.sdf \
-conformers_single_file ${codename}_Rotamer.sdf
$BCL molecule:AlignToScaffold $dipeptide ${codename}_Rotamer.sdf ${codename}_Rotamer.ats.sdf -explicit_aromaticity ; mv ${codename}_Rotamer.ats.sdf ${codename}_Rotamer.sdf
sed -e "/M END/r ${tag}_rosetta.RosettaInstructions.txt" ${codename}_Rotamer.sdf | awk '!f && /M END/ {f=1;next}1' > ${codename}_Rotamer_rosetta.sdf
rm -f ${codename}_params.log
$PYTHON $ROSETTA --clobber --polymer --all-in-one-pdb --name ${codename} -i ${codename}_Rotamer_rosetta.sdf >> ${codename}_params.log
echo "PDB_ROTAMERS"
readlink -e ${codename}_rotamer.pdb>> ${codename}.params
In this way, I could sample many rotamers
and I saw this new ncAA in my FastDesign derived designs.
bcl/rotamer_library/ncaa_base/glycine_neutral.sdf.gz
Yes you do. It's just not that exact path. It is in your BCL installation, assuming you cloned from BCLCommons GitHub page.
I spent few months to generate ncAA rotamers with the classic method. But BCL is a game changer!
I am glad you like it! If it is useful to folks, I'll try to update it sooner rather than later to include more exotic NCAAs in the automated pipeline. I'm also looking at adding a quick semi-empirical package as an external library, along with an interface, to use for the geometry optimizations. So many projects, so little time.
Best of luck.
Thanks Ben, I could find glycine_neutral.sdf.gz and use it.
Anyway, I'm afraid that your https://github.com/brownbp1/ncaa_tutorial/blob/master/scripts/quick_parmgen.sh changes DFT optimized input (geometry wise).
Based on sdf files in https://github.com/brownbp1/ncaa_tutorial/tree/master/inputs, I think that input sdf file for quick_parmgen.sh is AA core region itself only (e.g. without dipeptide).
molecule:GenerateRosettaNCAAInstructions will conveniently add dipeptide anyway.
Therefore, I think that my above understanding is not right (e.g. running BCL molecule:GenerateRosettaNCAAInstructions twice is not necessary).
I can't run quick_parmgen with Ace.sdf and NMe.sdf with
=crt=bcl::chemistry=> Molecule contains unsatisfied valences; ConfScore will be very unreliable and conformers will suffer greatly in accuracy
terminate called after throwing an instance of 'std::bad_alloc'
When I ran quick_parmgen with Fmoc-5F-TRP.sdf, it generated lots of rotamers well.
However, the distance between F and C is changed.
For example, if you do distance #3/?:1@C9 #3/?:1@F1 (please change model # of course) at chimeraX with Fmoc-5F-TRP.sdf, you can see that C-F distance is 0.825 A
However, if you do distance #1/X:1@FT1 #1/X:1@CZ2 at chimeraX with the 1st molecule of Fmoc-5F-TRP_rotamer.pdb (e.g. Fmoc-5F-TRP_rotamer_1.pdb), you can see that C-F distance is changed to 1.346 A
This change of geometry by quick_parmgen.sh happened to my other ncAA as well.
I think that you designed quick_parmgen.sh to use DFT optimized input geometry sdf file. Then, let's fix quick_parmgen.sh so that DFT optimized input geometry is maintained? or I didn't understand something correctly?
I think that you can quickly (within 2 minutes) verify my concern. However, here I attach used files just in case. Fmoc-5F-TRP.zip
Anyway, I'm afraid that your https://github.com/brownbp1/ncaa_tutorial/blob/master/scripts/quick_parmgen.sh changes DFT optimized input (geometry wise).
Yeah, it definitely does. The quick_parmgen script uses BCL::Conf to generate the 3D structures and build rotamers. The QM optimization is only going to be useful if you either (A) use parent rotamers, or (B) use MakeRotLib. Otherwise, you would need to generate a conformational ensemble and DFT-optimize the whole thing. Again, the BCL thing is a quick protocol.
The quick paramgen script is just to high-throughput quick params for mono-substituted alpha-carbon non-canonical amino acids using a cheminformatics small molecule conformer ensemble generator to build rotamers (https://pubs.acs.org/doi/10.1021/acs.jcim.0c01140 and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4607025/).
Also, from the images you sent, it looks like you are comparing distances from a 2D structure without hydrogen atoms and a 3D structure, which seems odd. Is there a reason for this that I am not understanding?
I can't run quick_parmgen with Ace.sdf and NMe.sdf with
Yeah, these do not fall within the narrow scope of application of the molecule:GenerateRosettaNCAAInstructions application, which currently just works for mono-substituted alpha-carbon non-canonical amino acids. For these or other such NCAAs, you should follow the manual protocol in brownbp1/ncaa_tutorial.
Anyway, I'm afraid that your https://github.com/brownbp1/ncaa_tutorial/blob/master/scripts/quick_parmgen.sh changes DFT optimized input (geometry wise).
Yeah, it definitely does. The quick_parmgen script uses BCL::Conf to generate the 3D structures and build rotamers. The QM optimization is only going to be useful if you either (A) use parent rotamers, or (B) use MakeRotLib. Otherwise, you would need to generate a conformational ensemble and DFT-optimize the whole thing. Again, the BCL thing is a quick protocol.
The quick paramgen script is just to high-throughput quick params for mono-substituted alpha-carbon non-canonical amino acids using a cheminformatics small molecule conformer ensemble generator to build rotamers (https://pubs.acs.org/doi/10.1021/acs.jcim.0c01140 and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4607025/).
Yes, I think that you wrote somewhere that using a parent rotamer is useful (although quick) only for certain limited cases. I ran MakeRotLib for other ncAA rotamer generation once, but I had to spend too much time with many manual processes.
Therefore. I prefer to optimize quick_parmgen_before_NWChem derived rotamer pdb file (e.g. A51_rotamer.pdb) with QM method. I know this sounds too expensive. However, I will use/optimize only 100 random chosen rotamers from this pdb file (from 762 rotamers). Small molecule conformer generation is anyway.
Additionally, with NWChem, DFT 6-311G** based geometry optimization took 1~13 hr (9 hrs for phenylalanine, 13 hrs for tryptophan) per each dipeptide (e.g. 1 ncAA core + N-term + C-term). Therefore, I think that this is manageable.
Also, from the images you sent, it looks like you are comparing distances from a 2D structure without hydrogen atoms and a 3D structure, which seems odd. Is there a reason for this that I am not understanding?
Yes, that may seem odd. However, since what I care is a distance between Carbon and fluorine, I think that this is fine. When I use your Fmoc-5F-TRP.sdf, I think that add_h option in ConformerGenerator/GenerateRosettaNCAAInstructions adds hydrogens. Of course, for final rotamer preparation (e.g. DFT optimization after quick_parmgen), I will use hydrogen added one (e.g. A51_rotamer.pdb ).
I can't run quick_parmgen with Ace.sdf and NMe.sdf with
Yeah, these do not fall within the narrow scope of application of the molecule:GenerateRosettaNCAAInstructions application, which currently just works for mono-substituted alpha-carbon non-canonical amino acids. For these or other such NCAAs, you should follow the manual protocol in brownbp1/ncaa_tutorial.
hm,, I can't figure out how to run manual protocol in brownbp1/ncaa_tutorial... even after I look at its files...
Thank you Ben for posting this tutorial.
I read your ppt and it helps my understanding of bcl based ncAA rotamer preparation.
As of now, I am trying to reproduce workshop_2022's tutorial 8 instead of this 'brownbp1/ncaa_tutorial' since files at that workshop are easier for me to follow step by step.
I watched youtube video as well.
I generated dipeptide and conformers according to this workshop's tutorial pdf file.
However, I see when I tried to run
When I prepared rosetta ncAA rotamer by a classic method, I just ran
acccording to classic rosetta document. This old method worked well and I generated a full ncAA rotamer by this classic method eventually.
However, it appears that you used
../scripts/molfile_to_params_polymer/molfile_to_params_polymer_main.py
Can I use molfile_to_params_polymer_main.py as well?
Obviously, only molfile_to_params_polymer_main.py can deal partial_charge, -i while old molfile_to_params_polymer.py can't deal with them.
I can't find molfile_to_params_polymer_main.py in this brownbp1/ncaa_tutorial, workshop_2022's tutorial 8 and rosetta_bin_linux_2021.16.61629_bundle/main