flatironinstitute / DeepFRI

Deep functional residue identification
BSD 3-Clause "New" or "Revised" License
300 stars 75 forks source link

Analysis of PDB files #15

Open bcantarel opened 2 years ago

bcantarel commented 2 years ago

I was testing the use of DeepFri on PDB files -- so I downloaded the PDB file for 3LZB from the PDB API I get the following error: Traceback (most recent call last): File "predict.py", line 41, in predictor.predict(args.pdb_fn) File "/usr/local/DeepFRI/deepfrier/Predictor.py", line 107, in predict A, S, seqres = self._load_cmap(test_prot, cmap_thresh=cmap_thresh) File "/usr/local/DeepFRI/deepfrier/Predictor.py", line 74, in _load_cmap D, seq = load_predicted_PDB(filename) File "/usr/local/DeepFRI/deepfrier/utils.py", line 25, in load_predicted_PDB two = residues[y]["CA"].get_coord() File "/usr/local/lib/python3.8/dist-packages/Bio/PDB/Entity.py", line 45, in getitem return self.child_dict[id] KeyError: 'CA'

I was wondering if that was related to having many chains in the file -- so I tried to split the files into chains using pdb-tools (pub_splitchain) and I got a similar error: /usr/local/lib/python3.8/dist-packages/Bio/SeqIO/PdbIO.py:303: BiopythonParserWarning: 'HEADER' line not found; can 't determine PDB ID. warnings.warn( Traceback (most recent call last): File "predict.py", line 41, in predictor.predict(args.pdb_fn) File "/usr/local/DeepFRI/deepfrier/Predictor.py", line 107, in predict A, S, seqres = self._load_cmap(test_prot, cmap_thresh=cmap_thresh) File "/usr/local/DeepFRI/deepfrier/Predictor.py", line 74, in _load_cmap D, seq = load_predicted_PDB(filename) File "/usr/local/DeepFRI/deepfrier/utils.py", line 25, in load_predicted_PDB two = residues[y]["CA"].get_coord() File "/usr/local/lib/python3.8/dist-packages/Bio/PDB/Entity.py", line 45, in getitem return self.child_dict[id] KeyError: 'CA'

I'm not quite sure if there is another preferred input file.

bcantarel commented 2 years ago

I "cleaned" the PDB by removing any lines not matching ATOM, TER or END as was observed in the example pdb folder, but I still get this error: ### Computing predictions on a single protein... /usr/local/lib/python3.8/dist-packages/Bio/SeqIO/PdbIO.py:290: BiopythonParserWarning: 'HEADER' line not found; can 't determine PDB ID. warnings.warn( Traceback (most recent call last): File "predict.py", line 41, in predictor.predict(args.pdb_fn) File "/usr/local/DeepFRI/deepfrier/Predictor.py", line 109, in predict y = self.model([A, S], training=False).numpy()[:, :, 0].reshape(-1) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in call outputs = call_fn(inputs, *args, kwargs) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/functional.py", line 385, in call return self._run_internal_graph( File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/functional.py", line 508, in _run_int ernal_graph outputs = node.layer(*args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in call outputs = call_fn(inputs, args, kwargs) File "/usr/local/DeepFRI/deepfrier/layers.py", line 302, in call output = tf.keras.backend.batch_dot(self._normalize(inputs[1]), inputs[0]) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper return target(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/backend.py", line 1938, in batch_dot raise ValueError('Cannot do batch_dot on inputs with shapes ' + ValueError: Cannot do batch_dot on inputs with shapes (1, 265, 265) and (1, 281, 1024) with axes=[2, 1]. x.shape[2] != y.shape[1] (265 != 281).

PawelSzczerbiak commented 1 year ago

Sorry for late answer - this is a common problem in DeepFRI. As a workaround you can remove all residues without CA atom flag. However, the best way to run DeepFRI is to use contact maps generated e.g. from .cif files.