Open Atheris4 opened 8 months ago
Thank you for your question.
There are few POI-PROTAC-E3 complex structures in PDB. Therefore, we prepare them separately. However, there are still few POI-warhead or E3 ligase-ligand complex structures of PROTAC-DB in PDB. Therefore, we align the ligand (warhead) of the PROTAC-DB to existing E3 ligase (POI) in PDB. Here are the steps that you could take as a reference.
For each item in PROTAC-DB, first, POI name or E3 ligase name was used for structure searching in PDB respectively. The UniProt ID in PROTAC-DB is as a supplement.
Second, for the searching results, here are the selection criteria:
After the selection, we obtained a PDB entry corresponding to the item of PROTAC-DB and it contains the protein structure with a general pocket.
Third, we aligned the ligand (warhead) to the inhibitor in the selected structure and then did the energy minimization.
Therefore, the BRD7_6v1h_TargetProtein.pdb
is an aligned result, not the 6v1h.pdb
from PDB.
Hope it helps.
Thank you so much for the quick and helpful reply! Since we are trying to replicate your experiment as closely as possible, is there any way you can provide us with the extracted data?
If not, could you provide more information about your methods? For each POI/E3 ligase, do you go through every PDB entry to find the ideal inhibitor? How do you determine the similarity?
Also, how did you do the alignment and energy minimization?
I’d love to Zoom or call to clarify these issues.
I am sorry that I cannot provide the data now. But we are preparing a lager dataset of PROTACs and the data here will be included in that data set. They will be released soon after.
Also, I am sorry that I do not know the details of the data collection since it was mainly done by my lab mates. The selection was most done by experience. The similarity (tanimoto similarity) can be calculated by RDKit. The alignment and energy minimization were done using Maetro of Schrodinger.
Thank you for your reply. Do you have an estimate of when the data will be released?
Actually, we have put some data in "https://bailab.siais.shanghaitech.edu.cn/services/deepprotac-db" and you can take a look. And we are checking the new data and preparing for the manuscript. After that, we will release the whole data set.
Thank you for the information. Could you release the binding pocket extraction code so we can run it ourselves?
you can find them in prepare_data.ipynb
Thank you for the quick response.
The code in prepare_data.ipynb isn’t generalizable to extracting for all PROTAC cases in the dataset, since chain IDs change.
There isn’t code for how the PDB files were originally extracted. Was it done manually or automatically based on RCSB PDB API.
Is there any way I can contact you directly?
We manually changed the chain IDs. Alternatively, you can use select org.
in PyMol to extract the ligand if the mol2 file or pdb file only contains one protein and one ligand.
When you say manually, do you mean that you had a team go through every entry in PROTAC-DB. Did you use any automated filtering? I did some preliminary searching and was wondering if there was any usage of the PDB API that is available for PDB.
Yes, we just go through every entry in PROTAC-DB and did not use any automated filtering.
Thank you for the quick reply. We are a team of 2 and aren't able to manually go through everything. How do you recommend we approach the issue of binding pocket extraction?
For binding pocket extraction, actually, you can use select org.
in PyMol as previously mentioned. i.e. in the prepare_data.ipynb
, it should be
cmd.load(glob.glob('protacs/'+i+"/*igase.pdb")[0])
cmd.remove('h.')
cmd.select("ligase_ligand","org.")
cmd.save("ligase_ligand/"+i+".mol2","ligase_ligand")
cmd.select("ligase_pocket_5","byres ligase_ligand around 5")
cmd.save("ligase_pocket_5/"+i+".mol2","ligase_pocket_5")
cmd.delete("all")
and you need to make sure that the pdb file only contains one protein and and one ligand. In this case, the rename of the chain name is not necessary.
I understand. However, we would have to manually download all of the PDB files, and first search using the steps that you outlined earlier.
Is there anyone else you recommend we can contact? Do you have more direct contact information we can use?
For this step, we have not found any other methods. We just deal with them one by one manually.
You may send me at "lifenglei104@163.com"
Actually, we have put some data in "https://bailab.siais.shanghaitech.edu.cn/services/deepprotac-db" and you can take a look. And we are checking the new data and preparing for the manuscript. After that, we will release the whole data set.
Hello Dr. Authors, in the url you provide above, we still cannot find the data. It seemed just a search engine. Can you kindly provide the data you used in the paper in Nature Communications? Thank you.
Thank you guys so much for conducting this novel study. I had a few questions about the data. I was able to get the warhead, linker, and E3 ligand data by downloading from PROTAC DB. However, they don’t have crystal structures for the POI and E3 ligase. In your paper, you mentioned that you sourced these from PDB and extracted the binding pockets. How did you know where the binding pockets are, and also which PDB entry to use?
I downloaded the 6V1H BRD7 entry separately from your example and the PBD, and upon inspection they don’t look the same. Your example contains the relevant PROTAC while the PDB doesn’t. How did you guys configure this?
Your help on this issue would be greatly appreciated. Thanks again for setting the groundwork in this emerging area!