flatironinstitute / DeepFRI

Deep functional residue identification
BSD 3-Clause "New" or "Revised" License
292 stars 74 forks source link

BiopythonParserWarning: 'HEADER' line not found; can't determine PDB ID. #54

Open johnnytam100 opened 2 months ago

johnnytam100 commented 2 months ago

Hi DeepFRI! Do you have idea how to troubleshoot this BiopythonParserWarning: 'HEADER' line not found error as follows?

(DeepFRI) johnnytam100@DESKTOP-BDBH5VJ:/mnt/e/test/test_DeepFRI/DeepFRI$ python predict.py --pdb_dir ./allergen_esmfold_domain_pdb -ont mf --saliency
2024-07-08 12:16:36.392951: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2024-07-08 12:16:37.742941: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2024-07-08 12:16:37.904566: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:968] could not open file to read NUMA node: /sys/bus/pci/devices/0000:04:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-07-08 12:16:37.906981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:04:00.0 name: NVIDIA GeForce RTX 3080 Ti computeCapability: 8.6
coreClock: 1.695GHz coreCount: 80 deviceMemorySize: 12.00GiB deviceMemoryBandwidth: 849.46GiB/s
2024-07-08 12:16:37.907029: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2024-07-08 12:16:37.940296: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2024-07-08 12:16:37.975221: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2024-07-08 12:16:37.978857: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2024-07-08 12:16:38.033696: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2024-07-08 12:16:38.046140: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2024-07-08 12:16:38.047257: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64:
2024-07-08 12:16:38.047293: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2024-07-08 12:16:38.047688: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-08 12:16:38.056416: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3700035000 Hz
2024-07-08 12:16:38.058111: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x481ed10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2024-07-08 12:16:38.058134: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2024-07-08 12:16:38.059240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2024-07-08 12:16:38.059264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]
### Computing predictions from directory with PDB files...
/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/Bio/SeqIO/PdbIO.py:292: BiopythonParserWarning: 'HEADER' line not found; can't determine PDB ID.
  BiopythonParserWarning,
Traceback (most recent call last):
  File "predict.py", line 47, in <module>
    predictor.predict_from_PDB_dir(args.pdb_dir)
  File "/mnt/e/test/test_DeepFRI/DeepFRI/deepfrier/Predictor.py", line 144, in predict_from_PDB_dir
    y = self.model([A, S], training=False).numpy()[:, :, 0].reshape(-1)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 386, in call
    inputs, training=training, mask=mask)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 508, in _run_internal_graph
    outputs = node.layer(*args, **kwargs)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 386, in call
    inputs, training=training, mask=mask)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py", line 508, in _run_internal_graph
    outputs = node.layer(*args, **kwargs)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 659, in __call__
    return super(RNN, self).__call__(inputs, **kwargs)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/layers/cudnn_recurrent.py", line 110, in call
    output, states = self._process_batch(inputs, initial_state)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/keras/layers/cudnn_recurrent.py", line 507, in _process_batch
    outputs, h, c, _, _ = gen_cudnn_rnn_ops.cudnn_rnnv2(**args)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 1740, in cudnn_rnnv2
    ctx=_ctx)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 1817, in cudnn_rnnv2_eager_fallback
    attrs=_attrs, ctx=ctx, name=name)
  File "/home/johnnytam100/anaconda3/envs/DeepFRI/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.NotFoundError: Could not find device for node: {{node CudnnRNNV2}} = CudnnRNNV2[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="lstm", seed=0, seed2=0]
All kernels registered for op CudnnRNNV2:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]
 [Op:CudnnRNNV2]
Rohit-Satyam commented 1 month ago

Hi this is more of a tensorflow error rather than Biopython. Try the installation as I recommended here and let me know if it resolves the issue

Rohit-Satyam commented 1 month ago

To take care of Biopython inability to parse header of PDB I am using the following trick:

ls -1 /data/foldseek/af/*.pdb.gz | while read p; 
do
name=$(echo $p | xargs -n 1 basename | cut -f 2 -d '-'); 
python predict.py -pdb $p -ont bp -v;
python predict.py -pdb $p -ont mf -v;
python predict.py -pdb $p -ont cc -v;
python predict.py -pdb $p -ont ec -v;
sed -i "s/query_prot/${name}/g" DeepFRI_BP_predictions.csv;
sed -i "s/query_prot/${name}/g" DeepFRI_CC_predictions.csv;
sed -i "s/query_prot/${name}/g" DeepFRI_MF_predictions.csv;
sed -i "s/query_prot/${name}/g" DeepFRI_EC_predictions.csv;
mv DeepFRI_BP_predictions.csv ${name}.BP.csv;
mv DeepFRI_CC_predictions.csv ${name}.CC.csv;
mv DeepFRI_MF_predictions.csv ${name}.MF.csv;
mv DeepFRI_EC_predictions.csv ${name}.EC.csv;
done
johnnytam100 commented 2 weeks ago

Hi @Rohit-Satyam ! Thank you for helping out! May I know where should I run this trick?

ls -1 /data/foldseek/af/*.pdb.gz | while read p; 
do
name=$(echo $p | xargs -n 1 basename | cut -f 2 -d '-'); 
python predict.py -pdb $p -ont bp -v;
python predict.py -pdb $p -ont mf -v;
python predict.py -pdb $p -ont cc -v;
python predict.py -pdb $p -ont ec -v;
sed -i "s/query_prot/${name}/g" DeepFRI_BP_predictions.csv;
sed -i "s/query_prot/${name}/g" DeepFRI_CC_predictions.csv;
sed -i "s/query_prot/${name}/g" DeepFRI_MF_predictions.csv;
sed -i "s/query_prot/${name}/g" DeepFRI_EC_predictions.csv;
mv DeepFRI_BP_predictions.csv ${name}.BP.csv;
mv DeepFRI_CC_predictions.csv ${name}.CC.csv;
mv DeepFRI_MF_predictions.csv ${name}.MF.csv;
mv DeepFRI_EC_predictions.csv ${name}.EC.csv;
done
johnnytam100 commented 1 week ago

Hi this is more of a tensorflow error rather than Biopython. Try the installation as I recommended here and let me know if it resolves the issue

By the way, this solution didn't work.