Fenglei104 / DeepPROTACs

GNU General Public License v3.0
48 stars 16 forks source link

How to prepare my own input file? #7

Open targetprotein101 opened 1 year ago

targetprotein101 commented 1 year ago

Hi, I'd like to know how to prepare five different type of input files when I'm dealing with bunch of different PROTACs at the same time. I'm going to test 10 PROTACs and already prepared for mol2 files but I don't know how to merge in 1 each file. (5 different input, total)

Thanks, in advance!

targetprotein101 commented 1 year ago

Hi, I really want to know how I should prepare all the input files. As in the files in protacs directory, I can prepare .pdb and .smi and I can even prepare .mol2 file. But, I want to know where and how I should put my data.

Also, I just run $ python main.py after the download, but I got this msg.

[] $ python main.py Traceback (most recent call last): File "main.py", line 10, in from protacloader import PROTACSet, collater File "//bin/DeepPROTACs/protacloader.py", line 3, in from torch_geometric.data import Batch ModuleNotFoundError: No module named 'torch_geometric'

Many thanks, in advance!

Fenglei104 commented 1 year ago

Hi, I really want to know how I should prepare all the input files. As in the files in protacs directory, I can prepare .pdb and .smi and I can even prepare .mol2 file. But, I want to know where and how I should put my data.

Also, I just run $ python main.py after the download, but I got this msg.

[] $ python main.py Traceback (most recent call last): File "main.py", line 10, in from protacloader import PROTACSet, collater File "//bin/DeepPROTACs/protacloader.py", line 3, in from torch_geometric.data import Batch ModuleNotFoundError: No module named 'torch_geometric'

Many thanks, in advance!

Hi, you might forget to install pytorch geometric, please see https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html to install it.

Fenglei104 commented 1 year ago

Hi, I'd like to know how to prepare five different type of input files when I'm dealing with bunch of different PROTACs at the same time. I'm going to test 10 PROTACs and already prepared for mol2 files but I don't know how to merge in 1 each file. (5 different input, total)

Thanks, in advance!

Hi, if you have the .pdb files, please organize your files as the protacs files. Then please run prepare_data.ipynb first, to get the right form of files. If you just want to test the new data, main.py is not necessary, please use the case_study.ipynb. Don't forget to add the root in the file, otherwise it will use the default data. Such as,

ligase_ligand = GraphData("ligase_ligand", root="test_samples")
Fenglei104 commented 1 year ago

If you want to specify the process, please see https://pytorch-geometric.readthedocs.io/en/latest/tutorial/create_dataset.html and the file prepare_data.py.

targetprotein101 commented 1 year ago

Thanks. I've installed PyG. But I'm not familiar with Jupyter stuff. So I have bunch of questions.

  1. I've looked into case_study.ipynb but there's so much empty spaces after ",". I don't know where should I put "root=" ") and the "test_samples" should be name of what? the name of PROTAC or what? Everything's too vague to me.
  2. Where should I prepare my data in which directory and how can I specify the location of input data?

I'm thinking of using this program in a huge project and I really want to use this. Would you please help me with this?

Thanks!

Fenglei104 commented 1 year ago

Thanks. I've installed PyG. But I'm not familiar with Jupyter stuff. So I have bunch of questions.

  1. I've looked into case_study.ipynb but there's so much empty spaces after ",". I don't know where should I put "root=" ") and the "test_samples" should be name of what? the name of PROTAC or what? Everything's too vague to me.
  2. Where should I prepare my data in which directory and how can I specify the location of input data?

I'm thinking of using this program in a huge project and I really want to use this. Would you please help me with this?

Thanks!

  1. I added the root in the case_study.ipynb. As default, the value of the root is data in PyG, which means it will generate a new dir named data and put the processed files in the data dir. PyG will check the name of the dir and the generated files to decide whether it should re-process the raw data. So, you should rename the root file as there is a data dir already.
  2. You can use the Python Debugger pdb or just Debug on VS code or other IDEs, to run the case_study.ipynb and add breakpoints in the process function of the GraphData class. You can run it step by step to see how it process. https://pytorch-geometric.readthedocs.io/en/latest/tutorial/create_dataset.html, this link will help a lot.
targetprotein101 commented 1 year ago
  1. I have 10 different PROTACs with different linkers. Do I have to make 10 different input directories in "protacs" directory?

  2. Where should I prepare my data in which directory and how can I specify the location of input data? I don't understand any of what you said:

You can use the Python Debugger pdb or just Debug on VS code or other IDEs, to run the case_study.ipynb and add breakpoints in the process function of the GraphData class. You can run it step by step to see how it process. https://pytorch-geometric.readthedocs.io/en/latest/tutorial/create_dataset.html, this link will help a lot.

  1. I already have mol2 files on everything as you said in case of using web server. (except linker smiles file). Is there any ways that I can skip the processing?

  2. How can I run case_study.ipynb?

  3. I already deleted 'data' directory in your package as you mentioned in other issues. In this case, I can still use the name 'data' for running, right?

Thanks.

Fenglei104 commented 1 year ago

I uploaded the single prediction version just now. You can rename your prepared files as ligase_ligand.mol2, ligase_pocket.mol2, target_ligand.mol2, target_pocket.mol2 and linker.smi. Then put them into a dir like single_test and run the single_prediction.py plus the dir name for testing (one at a time), such as python single_prediction.py single_test. You do not need to use Jupyter files or change the root values in this way.

targetprotein101 commented 1 year ago

Thanks.

I already prepared for .smi and .pdb just like you did in /DeepPROTACs/protacs/1_BRD7_VHL Can't I just use .smi for linkers and protacs for a single test?

Fenglei104 commented 1 year ago

I am sorry I'm afraid that you can't use the .smi for linkers and PROTACs directly. The separation of E3 ligand, warhead and linker is needed as our model needs them as independent inputs. Also, I have tried to split the PROTACs using the linkers and PROTACs using RDKit, but sometimes it fails.

targetprotein101 commented 1 year ago

No. That's not what I meant. i mean, for the input in the directory, DeepPROTACs/protacs/1_BRD7_VHL , You put the sample data as linker_1.smi, protac_1.smi, and two PDB files. Why can't I use the same input types for running?

Fenglei104 commented 1 year ago

I'm sorry that I didn't mention the difference. In the directory, DeepPROTACs/protacs/1_BRD7_VHL, the two PDB files are all containing the ligands, and we name the chains as ABCD in order, so that we can split the ligands and the pockets. However, in single test, you do not need to consider the name of the chain, but you should split the ligands and the pockets manually.