FreyrS / dMaSIF

Other
193 stars 45 forks source link

Input seems to require a y value #36

Closed wjs20 closed 1 year ago

wjs20 commented 2 years ago

Hi

I'm trying to run your model on a single protein pair (target + binder). It seems you must supply a y/label column vector with each protein in the pair.

If you set the --single_pdb parameter to a pdb file on the command line, the first function that is called is load_protein_pair() which tries to access a 'y' attribute on the Data object returned by load_protein_npy()

def load_protein_pair(pdb_id, data_dir,single_pdb=False):
    """Loads a protein surface mesh and its features"""
    pspl = pdb_id.split("_")
    p1_id = pspl[0] + "_" + pspl[1]
    p2_id = pspl[0] + "_" + pspl[2]

    p1 = load_protein_npy(p1_id, data_dir, center=False,single_pdb=single_pdb)
    p2 = load_protein_npy(p2_id, data_dir, center=False,single_pdb=single_pdb)
    # pdist = ((p1['xyz'][:,None,:]-p2['xyz'][None,:,:])**2).sum(-1).sqrt()
    # pdist = pdist<2.0
    # y_p1 = (pdist.sum(1)>0).to(torch.float).reshape(-1,1)
    # y_p2 = (pdist.sum(0)>0).to(torch.float).reshape(-1,1)
    y_p1 = p1["y"] # <- tries to access a y value that will not be there at inference time
    y_p2 = p2["y"]
...

It looks like load_protein_npy() tries to set the 'y' attribute on the Data object to None if the --single_pdb parameter is set to a file, but irritatingly, the Data constructor does not set attributes if they are None.

So I just get a key error.

test_dataset = [load_protein_pair(args.single_pdb, NPY_DIR, single_pdb=True)]
test_pdb_ids = [args.single_pdb]
KeyError                                  Traceback (most recent call last)
Cell In [5], line 1
----> 1 test_dataset = [load_protein_pair(args.single_pdb, NPY_DIR, single_pdb=True)]
      2 test_pdb_ids = [args.single_pdb]

File ~/git/dMaSIF/data.py:247, in load_protein_pair(pdb_id, data_dir, single_pdb)
    242 p2 = load_protein_npy(p2_id, data_dir, center=False,single_pdb=single_pdb)
    243 # pdist = ((p1['xyz'][:,None,:]-p2['xyz'][None,:,:])**2).sum(-1).sqrt()
    244 # pdist = pdist<2.0
    245 # y_p1 = (pdist.sum(1)>0).to(torch.float).reshape(-1,1)
    246 # y_p2 = (pdist.sum(0)>0).to(torch.float).reshape(-1,1)
--> 247 y_p1 = p1["y"]
    248 y_p2 = p2["y"]
    250 protein_pair_data = PairData(
    251     xyz_p1=p1["xyz"],
    252     xyz_p2=p2["xyz"],
   (...)
    266     atom_types_p2=p2["atom_types"],
    267 )

File ~/mambaforge/envs/PyG-env6/lib/python3.10/site-packages/torch_geometric/data/data.py:444, in Data.__getitem__(self, key)
    443 def __getitem__(self, key: str) -> Any:
--> 444     return self._store[key]
...
File ~/mambaforge/envs/PyG-env6/lib/python3.10/site-packages/torch_geometric/data/storage.py:81, in BaseStorage.__getitem__(self, key)
     80 def __getitem__(self, key: str) -> Any:
---> 81     return self._mapping[key]

KeyError: 'y'
sivakanishka91 commented 1 year ago

Hi,

I have encountered the same issue. @wjs20, were you able to resolve this issue.

@jeanfeydy or @FreyrS, It would be great if you have a solution to this kind of an issue.

Thanks!

wjs20 commented 1 year ago

Hi sivakanishka

I can't remember if I solved this issue but I encountered too many other bugs to make this a practical use of my time, and the authors have not been forthcoming with answers thus far.

I would suggest pursuing something else...