DeepGraphLearning / GearNet

GearNet and Geometric Pretraining Methods for Protein Structure Representation Learning, ICLR'2023 (https://arxiv.org/abs/2203.06125)
MIT License
263 stars 27 forks source link

how to use the GearNet to extract the protein feature? #23

Closed onlyonewater closed 1 year ago

onlyonewater commented 1 year ago

hi, authors, a great work, I want to use the GearNet as a feature extractor to extract the protein features, how to use it?

thanks!!!

Oxer11 commented 1 year ago

Hi! If you want to extract protein features with GearNet, I suggest to read this tutorial carefully. Basically, you only need to write your customized dataset and then use GearNet as your encoder.

onlyonewater commented 1 year ago

ok, got it, I will have a try. thanks!

pearl-rabbit commented 1 year ago

Hello, I haven't used pretrained models before, and after reading the documentation, I still feel a bit confused. I want to use my own PDB dataset, but I don't know how to load it and apply the pretrained model to obtain its representation

Oxer11 commented 1 year ago

Hi, to use the pre-trained models, please check the last section of this tutorial for pre-training and fine-tuning. You will find how to load the pre-trained model and fine-tune on your own dataset.

pearl-rabbit commented 1 year ago

Hello, I'm sorry to bother you again. I hope to use GearNet loaded with pre-trained weights to extract protein features, but the program has neither output nor termination. What's the problem?

# protein
protein = data.Protein.from_pdb(pdb_file, atom_feature="position", bond_feature="length", residue_feature="symbol")
_protein = data.Protein.pack([protein])
protein = graph_construction_model(_protein)

# model
gearnet_edge = models.GearNet(input_dim=21, hidden_dims=[512, 512, 512, 512, 512, 512],
                              num_relation=7, edge_input_dim=59, num_angle_bin=8,
                              batch_norm=True, concat_hidden=True, short_cut=True, readout="sum")
pthfile = 'models/angle_gearnet_edge.pth'
net = torch.load(pthfile)
gearnet_edge.load_state_dict(net)

#output
with torch.no_grad():
    output = gearnet_edge(protein, protein.node_feature.float(), all_loss=None, metric=None)
Oxer11 commented 1 year ago

Hi, the code looks good to me.

Could you provide more information about the bug? What does it mean by no output from the program? It seems that you haven't included a print in your code. If the code runs without termination, could you please show which part the code will stuck at?

pearl-rabbit commented 1 year ago

The program has been running the last line of code:

output = gearnet_edge(protein, protein.node_feature.float(), all_loss=None, metric=None)
Oxer11 commented 1 year ago

I think it's simply because the model hasn't finished yet, since you're running the code on CPU without putting the model on GPU.

pearl-rabbit commented 1 year ago

I tried to replace 'utils. sparse_coo_tensor' with 'torch. sparse_coo_tensor' in line 802 of "layers.conv.py" , and the program was able to continue executing (although an error was reported later).

Oxer11 commented 1 year ago

This may be due to the compilation problem of torch_ext. You can check this issue. https://github.com/DeepGraphLearning/torchdrug/issues/8#issuecomment-916706055

Oxer11 commented 1 year ago

BTW, to generate the embeddings, don't forget to switch the model to .eval() mode by calling gearnet_edge.eval().

pearl-rabbit commented 1 year ago

Is it written like this?

gearnet_edge.eval()
output = gearnet_edge(graph=protein, input=protein.node_feature.float())
Oxer11 commented 1 year ago

Yes!

jinzhuwei commented 1 year ago

Hello, I haven't used pretrained models before, and after reading the documentation, I still feel a bit confused. I want to use my own PDB dataset, but I don't know how to load it and apply the pretrained model to obtain its representation

Hello, I want to consult about you using your own PDB dataset to build Graph, whether to implement, I also want to build a protein graph on my own dataset, I want to ask you about the implementation of this part of the sale, I hope to get your sharing, thank you very much.

Tizzzzy commented 1 month ago

Is it written like this?

gearnet_edge.eval()
output = gearnet_edge(graph=protein, input=protein.node_feature.float())

Hi pearl-rabbit: I am also trying to figure out how to use GearNet loaded with pre-trained weights to extract protein features. I am writing the same code as you, and below is my code:

import os
import sys
import argparse
import torch
from torchdrug import core
sys.path.append(os.path.dirname(os.path.dirname(__file__)))
from gearnet.model import GearNetIEConv
from torchdrug.data import Protein
from torchdrug.core import Registry as R
from torchdrug import data, utils
from torchdrug import layers
from torchdrug.layers import geometry
from torchdrug import models

pdb_file = utils.download("https://files.rcsb.org/download/2LWZ.pdb", "./")

graph_construction_model = layers.GraphConstruction(node_layers=[geometry.AlphaCarbonNode()], 
                                                    edge_layers=[geometry.SpatialEdge(radius=10.0, min_distance=5),
                                                                 geometry.KNNEdge(k=10, min_distance=5),
                                                                 geometry.SequentialEdge(max_distance=2)],
                                                    edge_feature="gearnet")

# protein
protein = Protein.from_pdb(pdb_file, atom_feature="position", bond_feature="length", residue_feature="symbol")
_protein = Protein.pack([protein])
protein = graph_construction_model(_protein)

# model
gearnet_edge = models.GearNet(input_dim=21, hidden_dims=[512, 512, 512, 512, 512, 512],
                              num_relation=7, edge_input_dim=59, num_angle_bin=8,
                              batch_norm=True, concat_hidden=True, short_cut=True, readout="sum")
pthfile = '/content/angle_gearnet_edge.pth'
net = torch.load(pthfile, map_location=torch.device('cpu'))
gearnet_edge.load_state_dict(net)

#output
with torch.no_grad():
    gearnet_edge.eval()
    print(protein)
    output = gearnet_edge(protein, protein.node_feature.float(), all_loss=None, metric=None)
    print(output)

However, I am getting following error:

Traceback (most recent call last):
  File "/content/GearNet-main/script/test.py", line 40, in <module>
    output = gearnet_edge(protein, protein.node_feature.float(), all_loss=None, metric=None)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torchdrug/models/gearnet.py", line 95, in forward
    hidden = self.layers[i](graph, layer_input)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torchdrug/layers/conv.py", line 91, in forward
    update = self.message_and_aggregate(graph, input)
  File "/usr/local/lib/python3.10/dist-packages/torchdrug/layers/conv.py", line 813, in message_and_aggregate
    return update.view(graph.num_node, self.num_relation * self.input_dim)
RuntimeError: shape '[57, 147]' is invalid for input of size 1197

Can you please take a look at my code? Thank you so much