biomed-AI / MUSE

13 stars 1 forks source link

Data missing #1

Open zickun opened 3 weeks ago

zickun commented 3 weeks ago

Dear author, Thank you very much for your great contribution to the community and I am very interested in your research! However, it is a great pity that when I reproduced your work, I found that the PDB folder was missing in the data set folder. I know from the paper that I need to download, but I do not know which one should be downloaded. Could you please update the data section of readme to help me replicate your work?In addition, I sent you google Email, I wonder if you have received it.

Jh-SYSU commented 3 weeks ago

Thanks for your interesting.

The native PDB structures can be obtained from https://github.com/zqgao22/HIGH-PPI (edge_list_12, x_list). Also, you can use the predicted PDB structures from the pre-trained model ESMFold (https://github.com/facebookresearch/esm) with the protein sequences.

zickun commented 2 weeks ago

<<<<<<<<<< Protein GNN training >>>>>>>>>> Processing... Processing protein-protein interaction graph... Processing protein graphs... 0%| | 0/1553 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/ryz/MUSE/trainer_ppi.py", line 651, in trainer.multi_scale_em_train() File "/home/ryz/MUSE/trainer_ppi.py", line 77, in multi_scale_em_train self.gnnmodel, = self._maximization(link_model=self.link_model, File "/home/ryz/MUSE/trainer_ppi.py", line 102, in _maximization self.gnn_trainer = ProteinGNNTrainer(args=self.args, File "/home/ryz/MUSE/trainer_ppi.py", line 211, in init self.train_loader, self.test_loader = self.create_dataloaders() File "/home/ryz/MUSE/trainer_ppi.py", line 225, in create_dataloaders train_dataset = ProteinDataset(self.args, self.config, split='train') File "/home/ryz/MUSE/dataset.py", line 209, in init super(ProteinDataset, self).init(root=os.path.join(self.inter_dataset_root, self.datasetname.replace('-', ''))) File "/home/ryz/anaconda3/envs/MUSE/lib/python3.9/site-packages/torch_geometric/data/in_memory_dataset.py", line 57, in init super().init(root, transform, pre_transform, pre_filter, log) File "/home/ryz/anaconda3/envs/MUSE/lib/python3.9/site-packages/torch_geometric/data/dataset.py", line 97, in init self._process() File "/home/ryz/anaconda3/envs/MUSE/lib/python3.9/site-packages/torch_geometric/data/dataset.py", line 230, in _process self.process() File "/home/ryz/MUSE/dataset.py", line 324, in process protein_graph_list = self.process_protein_graph(list(protein_idx2protein.values()), [protein_idx2sequence[i] for i in protein_idx2protein.keys()]) File "/home/ryz/MUSE/dataset.py", line 334, in process_protein_graph X = torch.load(self.raw_dir + "/pdb/" + name + ".tensor") File "/home/ryz/anaconda3/envs/MUSE/lib/python3.9/site-packages/torch/serialization.py", line 699, in load with _open_file_like(f, 'rb') as opened_file: File "/home/ryz/anaconda3/envs/MUSE/lib/python3.9/site-packages/torch/serialization.py", line 230, in _open_file_like return _open_file(name_or_buffer, mode) File "/home/ryz/anaconda3/envs/MUSE/lib/python3.9/site-packages/torch/serialization.py", line 211, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: '/home/ryz/MUSE/datasets/high_ppi/raw/pdb/9606.ENSP00000000233.tensor'

i already have these two files(edge_list_12, x_list) in '/home/ryz/MUSE/datasets/high_ppi/raw/'. Can you tell me what should go in ‘PDB’ folder?