Fenglei104 / DeepPROTACs

GNU General Public License v3.0
43 stars 16 forks source link

POI and E3 ligase binding pocket locations #12

Open Atheris4 opened 8 months ago

Atheris4 commented 8 months ago

Thank you guys so much for conducting this novel study. I had a few questions about the data. I was able to get the warhead, linker, and E3 ligand data by downloading from PROTAC DB. However, they don’t have crystal structures for the POI and E3 ligase. In your paper, you mentioned that you sourced these from PDB and extracted the binding pockets. How did you know where the binding pockets are, and also which PDB entry to use?

I downloaded the 6V1H BRD7 entry separately from your example and the PBD, and upon inspection they don’t look the same. Your example contains the relevant PROTAC while the PDB doesn’t. How did you guys configure this?

Your help on this issue would be greatly appreciated. Thanks again for setting the groundwork in this emerging area!

Fenglei104 commented 8 months ago

Thank you for your question.

There are few POI-PROTAC-E3 complex structures in PDB. Therefore, we prepare them separately. However, there are still few POI-warhead or E3 ligase-ligand complex structures of PROTAC-DB in PDB. Therefore, we align the ligand (warhead) of the PROTAC-DB to existing E3 ligase (POI) in PDB. Here are the steps that you could take as a reference.

For each item in PROTAC-DB, first, POI name or E3 ligase name was used for structure searching in PDB respectively. The UniProt ID in PROTAC-DB is as a supplement.

Second, for the searching results, here are the selection criteria:

  1. There should be an inhibitor in the structure so that we can ensure the pocket position.
  2. The inhibitor should be alike to the ligand (warhead) in this item of PROTAC-DB.
  3. The resolution of the structure should be as higher as possible.

After the selection, we obtained a PDB entry corresponding to the item of PROTAC-DB and it contains the protein structure with a general pocket.

Third, we aligned the ligand (warhead) to the inhibitor in the selected structure and then did the energy minimization.

Therefore, the BRD7_6v1h_TargetProtein.pdb is an aligned result, not the 6v1h.pdb from PDB.

Hope it helps.

Atheris4 commented 8 months ago

Thank you so much for the quick and helpful reply! Since we are trying to replicate your experiment as closely as possible, is there any way you can provide us with the extracted data?

If not, could you provide more information about your methods? For each POI/E3 ligase, do you go through every PDB entry to find the ideal inhibitor? How do you determine the similarity?

Also, how did you do the alignment and energy minimization?

I’d love to Zoom or call to clarify these issues.

Fenglei104 commented 8 months ago

I am sorry that I cannot provide the data now. But we are preparing a lager dataset of PROTACs and the data here will be included in that data set. They will be released soon after.

Also, I am sorry that I do not know the details of the data collection since it was mainly done by my lab mates. The selection was most done by experience. The similarity (tanimoto similarity) can be calculated by RDKit. The alignment and energy minimization were done using Maetro of Schrodinger.

Atheris4 commented 6 months ago

Thank you for your reply. Do you have an estimate of when the data will be released?

Fenglei104 commented 6 months ago

Actually, we have put some data in "https://bailab.siais.shanghaitech.edu.cn/services/deepprotac-db" and you can take a look. And we are checking the new data and preparing for the manuscript. After that, we will release the whole data set.

Atheris4 commented 6 months ago

Thank you for the information. Could you release the binding pocket extraction code so we can run it ourselves?

Fenglei104 commented 6 months ago

you can find them in prepare_data.ipynb

Atheris4 commented 6 months ago

Thank you for the quick response.

The code in prepare_data.ipynb isn’t generalizable to extracting for all PROTAC cases in the dataset, since chain IDs change.

There isn’t code for how the PDB files were originally extracted. Was it done manually or automatically based on RCSB PDB API.

Is there any way I can contact you directly?

Fenglei104 commented 6 months ago

We manually changed the chain IDs. Alternatively, you can use select org. in PyMol to extract the ligand if the mol2 file or pdb file only contains one protein and one ligand.

Atheris4 commented 6 months ago

When you say manually, do you mean that you had a team go through every entry in PROTAC-DB. Did you use any automated filtering? I did some preliminary searching and was wondering if there was any usage of the PDB API that is available for PDB.

Fenglei104 commented 6 months ago

Yes, we just go through every entry in PROTAC-DB and did not use any automated filtering.

Atheris4 commented 6 months ago

Thank you for the quick reply. We are a team of 2 and aren't able to manually go through everything. How do you recommend we approach the issue of binding pocket extraction?

Fenglei104 commented 6 months ago

For binding pocket extraction, actually, you can use select org. in PyMol as previously mentioned. i.e. in the prepare_data.ipynb, it should be

    cmd.load(glob.glob('protacs/'+i+"/*igase.pdb")[0])
    cmd.remove('h.')
    cmd.select("ligase_ligand","org.")
    cmd.save("ligase_ligand/"+i+".mol2","ligase_ligand")
    cmd.select("ligase_pocket_5","byres ligase_ligand around 5")
    cmd.save("ligase_pocket_5/"+i+".mol2","ligase_pocket_5")
    cmd.delete("all")

and you need to make sure that the pdb file only contains one protein and and one ligand. In this case, the rename of the chain name is not necessary.

Atheris4 commented 6 months ago

I understand. However, we would have to manually download all of the PDB files, and first search using the steps that you outlined earlier.

Is there anyone else you recommend we can contact? Do you have more direct contact information we can use?

Fenglei104 commented 6 months ago

For this step, we have not found any other methods. We just deal with them one by one manually.
You may send me at "lifenglei104@163.com"

stzhangjie commented 6 months ago

Actually, we have put some data in "https://bailab.siais.shanghaitech.edu.cn/services/deepprotac-db" and you can take a look. And we are checking the new data and preparing for the manuscript. After that, we will release the whole data set.

Hello Dr. Authors, in the url you provide above, we still cannot find the data. It seemed just a search engine. Can you kindly provide the data you used in the paper in Nature Communications? Thank you.