DeepGraphLearning / GearNet

GearNet and Geometric Pretraining Methods for Protein Structure Representation Learning, ICLR'2023 (
MIT License
253 stars 28 forks source link

Inference on PDB file by conversion into or #55

Closed gtamer2 closed 7 months ago

gtamer2 commented 8 months ago


I am getting errors that are blocking me from running GearNet inference on an input PDB file.

First, I loaded a PDB file into a structure. Second, I followed the GearNet graph construction laid out in TorchProtein tutorial 3: Structure-based Protein Property Prediction. I encapsulated the graph construction logic in a function

However, when running these two steps:

protein_graph =
gearnet_protein_graph = graph_construction_model(protein_graph)

I get the following error:

  File "<path>/torchdrug/layers/geometry/", line 171, in forward
    is_node_in = graph.atom2residue >= (graph.num_cum_residues - graph.num_residues)[graph.atom2graph] - i
AttributeError: 'Protein' object has no attribute 'num_cum_residues'

I studied the source code and found that num_cum_residues is a property of but not of for

So, third, I attempted to convert Protein into PackedProtein, with resulting code:

        protein_graph = Protein.from_pdb(path_to_pdb_file)
        num_edges = protein_graph.edge_list.shape[0]
        num_residues = protein_graph.residue_type.shape[0]

        packed_protein_graph = PackedProtein(edge_list=protein_graph.edge_list,

        gearnet_protein_graph = self.graph_construction_model(
        print("gearnet protein graph: {}".format(gearnet_protein_graph))
        return gearnet_protein_graph

However, now I get an error that ValueError: Expect node attributeatom_typeto have shape (16344, *), but found torch.Size([16448]) (16448 is, I assume, the number of nodes derived from the edge_list).

Is this the right approach to run inference with Gearnet? I downloaded the PDB files directly from, so I'd like to think the issue is not in the input data. Thank you in advance for any guidance here.

Example PDB files that can't be processed:

gtamer2 commented 8 months ago

I have studied the pretrain/downstream scripts' way of initializing a dataset as here:, but from studying the Torchdrug source code, this method is specific to TorchDrug-registered datasets.

gtamer2 commented 8 months ago

Is the solution to load the PDB files as HDF5 files like gearnet/ is doing here: and to pass in the Gearnet graph transformation as a parameter here: ?

When I try this, I get an error ERRROR: OSError: Unable to open file (file signature not found), and I'm not sure how to convert a PDB file to HDF5 format.

mpedraza98 commented 8 months ago

I have tried with

_protein = data.Protein.pack([protein])   
protein_ = graph_construction_model(_protein)

as described in the tutorials and had no issue at all

gtamer2 commented 7 months ago

This fixed it for me. Not sure why I missed that option. Thanks!