bzho3923 / ProtLGN

24 stars 7 forks source link

where is the weights? how to prepare the dataset? #2

Open AaranWang opened 2 months ago

AaranWang commented 2 months ago

So mean explanation about how to construct and run ProtLGN?

tyang816 commented 1 month ago

Hi, Wang,

Sorry for the late reply, we have updated the model weight, example dataset, and zero-shot scripts for ProtLGN.

  1. model weight: ckpt/ProtLGN.pt
  2. example dataset: data/example
  3. zero-shot script: script/mutant_predict.sh

And we recently developed two more advanced protein engineering tools named ProtSSN and ProSST for zero-shot prediction. We recommend you try the new models!

Best wishes, Yang Tan

AaranWang commented 1 month ago

Thank you. I'm trying to reproduce the results of ProtLGN. I encountered challenges in Step 2: build graph dataset when i ran the command "python data.py --build_cath --protein_dataset data/cath40_k10 --c_alpha_max_neighbors 10 --use_sasa --use_bfactor --use_dihedral --use_coordinate" in script/build_cath_dataset.sh. I have downloaded cath-dataset4.2.0 and put it in data/cath_k10/raw directory. So are the materials for build graph dataset are complete now? If not, can you please provide the data in data/cath_k10/raw directory? Thank you.

AaranWang commented 1 month ago

The error message: $ python data.py --build_cath --protein_dataset data/cath40_k10 --c_alpha_max_neighbors 10 --use_sasa --use_bfactor --use_dihedral --use_coordinate Processing... 0it [00:00, ?it/s] Traceback (most recent call last): File "/home/wangq/Programs/ProtLGN/data.py", line 102, in protein_dataset(args, "train") File "/home/wangq/Programs/ProtLGN/data.py", line 17, in protein_dataset dataset = Protein( File "/home/wangq/Programs/ProtLGN/src/Dataset/protein_dataset.py", line 198, in init super().init(root, transform, pre_transform, pre_filter) File "/home/wangq/Programs/Miniconda3/envs/ProtLGN/lib/python3.10/site-packages/torch_geometric/data/in_memory_dataset.py", line 81, in init super().init(root, transform, pre_transform, pre_filter, log, File "/home/wangq/Programs/Miniconda3/envs/ProtLGN/lib/python3.10/site-packages/torch_geometric/data/dataset.py", line 115, in init self._process() File "/home/wangq/Programs/Miniconda3/envs/ProtLGN/lib/python3.10/site-packages/torch_geometric/data/dataset.py", line 260, in _process self.process() File "/home/wangq/Programs/ProtLGN/src/Dataset/protein_dataset.py", line 264, in process self.normalize_file = get_stat(self.saved_graph_dir) File "/home/wangq/Programs/ProtLGN/src/Dataset/dataset_utils.py", line 54, in get_stat graph = torch.load(os.path.join(graph_root, filenames[0])) IndexError: list index out of range

tyang816 commented 1 month ago

I have updated the new data process script and it works well.

mkdir -p data/cath_k10/raw
cd data/cath_k10/raw
wget https://huggingface.co/datasets/tyang816/cath/blob/main/dompdb.tar
# or wget https://lianglab.sjtu.edu.cn/files/ProtSSN-2024/dompdb.tar
tar -xvf dompdb.tar

X9SK41IK@DZT5`I0DB%05$T

AaranWang commented 1 month ago

Thank you, i have successfully built the graph dataset, Now i encountered another new question, when i ran the script/run_pretrain.sh, it showed the error "FileNotFoundError: [Errno 2] No such file or directory: 'data/proteingym_valid/Proteingym_validk10'" Can you share a example dataset of proteingym? Thank you.