fhalab / MLDE

A machine-learning package for navigating combinatorial protein fitness landscapes.
Other
122 stars 26 forks source link

where is "GB1_T2Q_transformer_TempOutputs.pkl"? #6

Closed Ming-Qin-tech closed 1 month ago

Ming-Qin-tech commented 2 years ago

when I run "conda activate mlde python generate_encoding.py transformer GB1_T2Q --fasta ./code/validation/basic_test_data/2GI9.fasta --positions V39 D40 --batches 1 " I got an error:

FileNotFoundError: [Errno 2] No such file or directory: '/data/qm/MLDE/20211129-131456/Encodings/GB1_T2Q_transformer_TempOutputs.pkl'

log : " /data/qm/MLDE/code/encode/encoding_generator.py:32: UserWarning: Could not load TRANSFORMER_TO_CLASS. This is expected if you are running in the mlde environment or a custom environment without PyTorch installed. Otherwise, you might have a problem. It will not be possible to build encodings from ESM and ProtTrans with this import not working. warnings.warn("Could not load TRANSFORMER_TO_CLASS. This is expected if " Batch#: 0%| | 0/1 [00:00<?, ?it/s]Transformer with Parameters: n_layers: 12 n_heads: 8 d_model: 512 d_filter: 2048 dropout: 0.1 WARNING:tensorflow:From /data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 2021-11-29 13:17:26.387501: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED 2021-11-29 13:17:26.387706: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED 2021-11-29 13:17:26.387830: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(56, 512), b.shape=(512, 512), m=56, n=512, k=512 [[{{node transformer/transformer_encoder/encoder_stack/transformer_encoder_block/transformer_self_attention/self_attention/multi_head_attention/attention_qkv_projection/weight_norm_dense_26/Tensordot/MatMul}}]] [[{{node transformer/transformer_encoder/encoder_stack/transformer_encoder_block_11/transformer_feed_forward_11/add}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data/qm/anaconda3/envs/mlde/bin/tape-embed", line 33, in sys.exit(load_entry_point('tape', 'console_scripts', 'tape-embed')()) File "/data/qm/MLDE/code/tape-neurips2019/tape/run_embed.py", line 91, in main embeddings = run_embed(args.datafile, args.model, args.load_from, args.task) File "/data/qm/MLDE/code/tape-neurips2019/tape/run_embed.py", line 47, in run_embed protein_length: [int_sequence.shape[1]]}) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(56, 512), b.shape=(512, 512), m=56, n=512, k=512 [[node transformer/transformer_encoder/encoder_stack/transformer_encoder_block/transformer_self_attention/self_attention/multi_head_attention/attention_qkv_projection/weight_norm_dense_26/Tensordot/MatMul (defined at /data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/layers/normalization.py:76) ]] [[node transformer/transformer_encoder/encoder_stack/transformer_encoder_block_11/transformer_feed_forward_11/add (defined at /data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/models/transformer/transformer_ff.py:67) ]]

Caused by op 'transformer/transformer_encoder/encoder_stack/transformer_encoder_block/transformer_self_attention/self_attention/multi_head_attention/attention_qkv_projection/weight_norm_dense_26/Tensordot/MatMul', defined at: File "/data/qm/anaconda3/envs/mlde/bin/tape-embed", line 33, in sys.exit(load_entry_point('tape', 'console_scripts', 'tape-embed')()) File "/data/qm/MLDE/code/tape-neurips2019/tape/run_embed.py", line 91, in main embeddings = run_embed(args.datafile, args.model, args.load_from, args.task) File "/data/qm/MLDE/code/tape-neurips2019/tape/run_embed.py", line 37, in run_embed output = embedding_model({'primary': primary, 'protein_length': protein_length}) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call outputs = self.call(inputs, *args, kwargs) File "/data/qm/MLDE/code/tape-neurips2019/tape/models/Transformer.py", line 83, in call encoder_output = self.encoder(sequence, mask=attention_mask) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call outputs = self.call(inputs, *args, *kwargs) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/models/transformer/transformer_encoder.py", line 243, in call inputs, mask=(encoder_mask, conv_mask)) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call outputs = self.call(inputs, args, kwargs) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/layers/stack.py", line 42, in call output = layer(output, layer_args) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call outputs = self.call(inputs, *args, *kwargs) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/models/transformer/transformer_encoder.py", line 87, in call inputs, mask=self_attention_mask, return_attention_weights=True) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call outputs = self.call(inputs, args, kwargs) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/models/transformer/transformer_attention.py", line 47, in call attn_inputs, mask=mask, return_attention_weights=True) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call outputs = self.call(inputs, *args, kwargs) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/attention.py", line 510, in call return self.multi_attention((inputs, inputs, inputs), mask=mask, return_attention_weights=return_attention_weights) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call outputs = self.call(inputs, *args, *kwargs) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/attention.py", line 454, in call qkv_projection = self.compute_qkv((qa, ma, va)) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call outputs = self.call(inputs, args, kwargs) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/attention.py", line 116, in call values = self.value_norm(self.value_layer(value_antecedent)) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call outputs = self.call(inputs, *args, *kwargs) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/layers/normalization.py", line 76, in call outputs = standard_ops.tensordot(inputs, self.kernel, [[rank - 1], [0]]) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py", line 3583, in tensordot ab_matmul = matmul(a_reshape, b_reshape) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py", line 2455, in matmul a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5333, in mat_mul name=name) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, **kwargs) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File "/data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in init self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(56, 512), b.shape=(512, 512), m=56, n=512, k=512 [[node transformer/transformer_encoder/encoder_stack/transformer_encoder_block/transformer_self_attention/self_attention/multi_head_attention/attention_qkv_projection/weight_norm_dense_26/Tensordot/MatMul (defined at /data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/common/layers/normalization.py:76) ]] [[node transformer/transformer_encoder/encoder_stack/transformer_encoder_block_11/transformer_feed_forward_11/add (defined at /data/qm/anaconda3/envs/mlde/lib/python3.7/site-packages/rinokeras/core/v1x/models/transformer/transformer_ff.py:67) ]]

Traceback (most recent call last): File "generate_encoding.py", line 65, in main() File "generate_encoding.py", line 61, in main batch_size = args.batch_size) File "/data/qm/MLDE/code/encode/encoding_generator.py", line 332, in generate_encodings unnormalized_embeddings = self._generate_tape(n_batches) File "/data/qm/MLDE/code/encode/encoding_generator.py", line 284, in _generate_tape with open(temp_filename, "rb") as f: FileNotFoundError: [Errno 2] No such file or directory: '/data/qm/MLDE/20211129-131456/Encodings/GB1_T2Q_transformer_TempOutputs.pkl' "

brucejwittmann commented 2 years ago

This looks like the same problem as https://github.com/fhalab/MLDE/issues/4. It is not a problem with MLDE but with tensorflow/cuda. Take a look through my suggestions on that issue and the linked tensorflow issue and see if anything helps.

The file GB1_T2Q_transformer_TempOutputs.pkl is a temporary file saved by MLDE. It is the output from the subprocess used to run TAPE. If this subprocess fails, then there is no file to be found and the main process fails as well. The TAPE subprocess is failing due to the tensorflow/cuda issue.