Closed Ming-Qin-tech closed 1 month ago
This looks like the same problem as https://github.com/fhalab/MLDE/issues/4. It is not a problem with MLDE but with tensorflow/cuda. Take a look through my suggestions on that issue and the linked tensorflow issue and see if anything helps.
The file GB1_T2Q_transformer_TempOutputs.pkl
is a temporary file saved by MLDE. It is the output from the subprocess used to run TAPE
. If this subprocess fails, then there is no file to be found and the main process fails as well. The TAPE
subprocess is failing due to the tensorflow/cuda issue.
when I run "conda activate mlde python generate_encoding.py transformer GB1_T2Q --fasta ./code/validation/basic_test_data/2GI9.fasta --positions V39 D40 --batches 1 " I got an error:
FileNotFoundError: [Errno 2] No such file or directory: '/data/qm/MLDE/20211129-131456/Encodings/GB1_T2Q_transformer_TempOutputs.pkl'
log : " /data/qm/MLDE/code/encode/encoding_generator.py:32: UserWarning: Could not load
TRANSFORMER_TO_CLASS
. This is expected if you are running in themlde
environment or a custom environment without PyTorch installed. Otherwise, you might have a problem. It will not be possible to build encodings from ESM and ProtTrans with this import not working. warnings.warn("Could not loadTRANSFORMER_TO_CLASS
. This is expected if " Batch#: 0%| | 0/1 [00:00<?, ?it/s]Transformer with Parameters: n_layers: 12 n_heads: 8 d_model: 512 d_filter: 2048 dropout: 0.1 WARNING:tensorflow:From /data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 2021-11-29 13:17:26.387501: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED 2021-11-29 13:17:26.387706: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED 2021-11-29 13:17:26.387830: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(56, 512), b.shape=(512, 512), m=56, n=512, k=512 [[{{node transformer/transformer_encoder/encoder_stack/transformer_encoder_block/transformer_self_attention/self_attention/multi_head_attention/attention_qkv_projection/weight_norm_dense_26/Tensordot/MatMul}}]] [[{{node transformer/transformer_encoder/encoder_stack/transformer_encoder_block_11/transformer_feed_forward_11/add}}]]During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/data/qm/anaconda3/envs/mlde/bin/tape-embed", line 33, in
sys.exit(load_entry_point('tape', 'console_scripts', 'tape-embed')())
File "/data/qm/MLDE/code/tape-neurips2019/tape/run_embed.py", line 91, in main
embeddings = run_embed(args.datafile, args.model, args.load_from, args.task)
File "/data/qm/MLDE/code/tape-neurips2019/tape/run_embed.py", line 47, in run_embed
protein_length: [int_sequence.shape[1]]})
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(56, 512), b.shape=(512, 512), m=56, n=512, k=512
[[node transformer/transformer_encoder/encoder_stack/transformer_encoder_block/transformer_self_attention/self_attention/multi_head_attention/attention_qkv_projection/weight_norm_dense_26/Tensordot/MatMul (defined at /data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/layers/normalization.py:76) ]]
[[node transformer/transformer_encoder/encoder_stack/transformer_encoder_block_11/transformer_feed_forward_11/add (defined at /data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/models/transformer/transformer_ff.py:67) ]]
Caused by op 'transformer/transformer_encoder/encoder_stack/transformer_encoder_block/transformer_self_attention/self_attention/multi_head_attention/attention_qkv_projection/weight_norm_dense_26/Tensordot/MatMul', defined at: File "/data/qm/anaconda3/envs/mlde/bin/tape-embed", line 33, in
sys.exit(load_entry_point('tape', 'console_scripts', 'tape-embed')())
File "/data/qm/MLDE/code/tape-neurips2019/tape/run_embed.py", line 91, in main
embeddings = run_embed(args.datafile, args.model, args.load_from, args.task)
File "/data/qm/MLDE/code/tape-neurips2019/tape/run_embed.py", line 37, in run_embed
output = embedding_model({'primary': primary, 'protein_length': protein_length})
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call
outputs = self.call(inputs, *args, kwargs)
File "/data/qm/MLDE/code/tape-neurips2019/tape/models/Transformer.py", line 83, in call
encoder_output = self.encoder(sequence, mask=attention_mask)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call
outputs = self.call(inputs, *args, *kwargs)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/models/transformer/transformer_encoder.py", line 243, in call
inputs, mask=(encoder_mask, conv_mask))
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call
outputs = self.call(inputs, args, kwargs)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/layers/stack.py", line 42, in call
output = layer(output, layer_args)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call
outputs = self.call(inputs, *args, *kwargs)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/models/transformer/transformer_encoder.py", line 87, in call
inputs, mask=self_attention_mask, return_attention_weights=True)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call
outputs = self.call(inputs, args, kwargs)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/models/transformer/transformer_attention.py", line 47, in call
attn_inputs, mask=mask, return_attention_weights=True)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call
outputs = self.call(inputs, *args, kwargs)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/attention.py", line 510, in call
return self.multi_attention((inputs, inputs, inputs), mask=mask, return_attention_weights=return_attention_weights)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call
outputs = self.call(inputs, *args, *kwargs)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/attention.py", line 454, in call
qkv_projection = self.compute_qkv((qa, ma, va))
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call
outputs = self.call(inputs, args, kwargs)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/attention.py", line 116, in call
values = self.value_norm(self.value_layer(value_antecedent))
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call
outputs = self.call(inputs, *args, *kwargs)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/layers/normalization.py", line 76, in call
outputs = standard_ops.tensordot(inputs, self.kernel, [[rank - 1], [0]])
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py", line 3583, in tensordot
ab_matmul = matmul(a_reshape, b_reshape)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py", line 2455, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5333, in mat_mul
name=name)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(args, **kwargs)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()
InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(56, 512), b.shape=(512, 512), m=56, n=512, k=512 [[node transformer/transformer_encoder/encoder_stack/transformer_encoder_block/transformer_self_attention/self_attention/multi_head_attention/attention_qkv_projection/weight_norm_dense_26/Tensordot/MatMul (defined at /data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/layers/normalization.py:76) ]] [[node transformer/transformer_encoder/encoder_stack/transformer_encoder_block_11/transformer_feed_forward_11/add (defined at /data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/models/transformer/transformer_ff.py:67) ]]
Traceback (most recent call last): File "generate_encoding.py", line 65, in
main()
File "generate_encoding.py", line 61, in main
batch_size = args.batch_size)
File "/data/qm/MLDE/code/encode/encoding_generator.py", line 332, in generate_encodings
unnormalized_embeddings = self._generate_tape(n_batches)
File "/data/qm/MLDE/code/encode/encoding_generator.py", line 284, in _generate_tape
with open(temp_filename, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/qm/MLDE/20211129-131456/Encodings/GB1_T2Q_transformer_TempOutputs.pkl'
"