aqlaboratory / rgn2

101 stars 29 forks source link

Colab fails #3

Closed phiweger closed 1 year ago

phiweger commented 2 years ago

I'd like to try RGN2 using the colab notebook, but it fails during the "Run RGN2" step:

CalledProcessError: Command 'python rgn/protling.py runs/15106000/configuration -p -e 'weighted_testing' -a -g 0' returned non-zero exit status 1.

Query:

DYDQYCADVAAEELMNALVNSTLLEARATNQFLAVSKGNCSGPTTIRGQFSNMSLSLLDLYLSRGYNVSSIVTMTSQGMYGGTYLVEKPNLSSKESELSQLSMHRVFEVGVIRNPGLGAPVFHMTNYFEQPVSNDFSNCMVALGELKFAALCHREDSITIPYQGSGKGVSFQLVKLGVWKSPTDMQSWVPLSTDDPVIDRLYLSSHRGVIADNQAKWAVPTTRTDDKLRMETCFQQACKGKIQALCENPEWTPLKDNRIPSYGVLSVDLSLTVELKIKIASGFGPLITHGSGMDLYKSNHNNMYWLTIPPMKNLALGVINTLEWIPRFKVSPNLFTVPIKEAGEDCHAPTYLPAEVDGDVKLSSNLVILPGQDLQYVLATYDTSRVEHAVVYYVYSPSRSFSYFYPFRLPIKGVPIELQVECFTWDQKLWCRHFCVLADSESGGHITHSGMVGMGVSCTATREDGTNRR

I did not change any settings, and during the first step I let the notebook reload as intended before continuing.

Thank you for looking into this!

christinaflo commented 2 years ago

Colab removed support for TF 1.x and also the cuda version has now changed and is incompatible. Working to fix this now, I will get back to you.

christinaflo commented 2 years ago

Okay it should be fixed!

phiweger commented 2 years ago

Sorry, but now I get a new error:

no change     /opt/conda/condabin/conda
no change     /opt/conda/bin/conda
no change     /opt/conda/bin/conda-env
no change     /opt/conda/bin/activate
no change     /opt/conda/bin/deactivate
no change     /opt/conda/etc/profile.d/conda.sh
no change     /opt/conda/etc/fish/conf.d/conda.fish
no change     /opt/conda/shell/condabin/Conda.psm1
no change     /opt/conda/shell/condabin/conda-hook.ps1
no change     /opt/conda/lib/python3.9/site-packages/xontrib/conda.xsh
no change     /opt/conda/etc/profile.d/conda.csh
no change     /root/.bashrc
No action taken.
Sequences being removed due to length: 0
Sequences being removed: [] []
Featurizing input
WARNING:tensorflow:From /content/rgn2/aminobert/optimization.py:110: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From /content/rgn2/aminobert/prediction.py:18: The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead.

WARNING:tensorflow:From /content/rgn2/aminobert/modeling.py:92: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fcd7b532200>) includes params argument, but params are not passed to Estimator.
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:From /opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /content/rgn2/aminobert/run_finetuning_and_prediction.py:331: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

WARNING:tensorflow:From /content/rgn2/aminobert/modeling.py:174: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /content/rgn2/aminobert/modeling.py:415: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /content/rgn2/aminobert/modeling.py:497: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.

WARNING:tensorflow:From /content/rgn2/aminobert/modeling.py:678: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
WARNING:tensorflow:From /opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /content/rgn2/aminobert/modeling.py:281: The name tf.erf is deprecated. Please use tf.math.erf instead.

WARNING:tensorflow:From /content/rgn2/aminobert/run_finetuning_and_prediction.py:280: The name tf.accumulate_n is deprecated. Please use tf.math.accumulate_n instead.

WARNING:tensorflow:From /content/rgn2/aminobert/run_finetuning_and_prediction.py:315: The name tf.squared_difference is deprecated. Please use tf.math.squared_difference instead.

WARNING:tensorflow:From /content/rgn2/aminobert/run_finetuning_and_prediction.py:362: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

WARNING:tensorflow:From /content/rgn2/aminobert/run_finetuning_and_prediction.py:377: The name tf.train.init_from_checkpoint is deprecated. Please use tf.compat.v1.train.init_from_checkpoint instead.

WARNING:tensorflow:From /opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2022-10-09 18:10:15.677753: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2022-10-09 18:10:15.683540: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200220000 Hz
2022-10-09 18:10:15.684079: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x563b6ebcee00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-10-09 18:10:15.684111: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-10-09 18:10:15.685755: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-10-09 18:10:15.822630: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x563b6ebcf340 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-10-09 18:10:15.822693: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): A100-SXM4-40GB, Compute Capability 8.0
2022-10-09 18:10:15.824880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: A100-SXM4-40GB major: 8 minor: 0 memoryClockRate(GHz): 1.41
pciBusID: 0000:00:04.0
2022-10-09 18:10:15.825224: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-10-09 18:10:15.826356: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-10-09 18:10:15.827581: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2022-10-09 18:10:15.827975: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2022-10-09 18:10:15.829337: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2022-10-09 18:10:15.830369: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2022-10-09 18:10:15.833345: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-10-09 18:10:15.837334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2022-10-09 18:10:15.837404: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-10-09 18:10:15.840020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-10-09 18:10:15.840046: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2022-10-09 18:10:15.840054: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2022-10-09 18:10:15.843953: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2022-10-09 18:10:15.843997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 38114 MB memory) -> physical GPU (device: 0, name: A100-SXM4-40GB, pci bus id: 0000:00:04.0, compute capability: 8.0)
2022-10-09 18:14:38.065361: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-10-09 18:15:12.239428: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
2022-10-09 18:15:12.239490: E tensorflow/stream_executor/cuda/cuda_blas.cc:2301] Internal: failed BLAS call, see log for details
ERROR:tensorflow:Error recorded from prediction_loop: 2 root error(s) found.
  (0) Internal: Blas xGEMMBatched launch failed : a.shape=[12,1024,64], b.shape=[12,1024,64], m=1024, n=1024, k=64, batch_size=12
     [[node bert/encoder/layer_0/attention/self/MatMul (defined at /opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
     [[Sum/_1265]]
  (1) Internal: Blas xGEMMBatched launch failed : a.shape=[12,1024,64], b.shape=[12,1024,64], m=1024, n=1024, k=64, batch_size=12
     [[node bert/encoder/layer_0/attention/self/MatMul (defined at /opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'bert/encoder/layer_0/attention/self/MatMul':
  File "<stdin>", line 26, in <module>
  File "/content/rgn2/aminobert/prediction.py", line 126, in aminobert_predict_sequence
    aminobert_predict(seqs=seqs, headers=headers, fastas=fastas, checkpoint=checkpoint)
  File "/content/rgn2/aminobert/prediction.py", line 108, in aminobert_predict
    inf_result = run_prediction(seqs, qfunc, checkpoint)
  File "/content/rgn2/aminobert/prediction.py", line 56, in run_prediction
    clip_seq_level_outputs=clip_seq_level_outputs
  File "/content/rgn2/aminobert/run_finetuning_and_prediction.py", line 611, in run_model
    for i,prediction in enumerate(result):
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3072, in predict
    yield_single_examples=yield_single_examples):
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 622, in predict
    features, None, ModeKeys.PREDICT, self.config)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
    config)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3126, in _model_fn
    features, labels, is_export_mode=is_export_mode)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1663, in call_without_tpu
    return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1994, in _call_model_fn
    estimator_spec = self._model_fn(features=features, **kwargs)
  File "/content/rgn2/aminobert/run_finetuning_and_prediction.py", line 358, in model_fn
    wt_log_prob_mat=wt_log_prob_mat)
  File "/content/rgn2/aminobert/run_finetuning_and_prediction.py", line 219, in create_model
    use_one_hot_embeddings=use_one_hot_embeddings)
  File "/content/rgn2/aminobert/modeling.py", line 219, in __init__
    do_return_all_layers=True)
  File "/content/rgn2/aminobert/modeling.py", line 851, in transformer_model
    to_seq_length=seq_length)
  File "/content/rgn2/aminobert/modeling.py", line 708, in attention_layer
    attention_scores = tf.matmul(query_layer, key_layer, transpose_b=True)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 2716, in matmul
    return batch_mat_mul_fn(a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 1712, in batch_mat_mul_v2
    "BatchMatMulV2", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas xGEMMBatched launch failed : a.shape=[12,1024,64], b.shape=[12,1024,64], m=1024, n=1024, k=64, batch_size=12
     [[{{node bert/encoder/layer_0/attention/self/MatMul}}]]
     [[Sum/_1265]]
  (1) Internal: Blas xGEMMBatched launch failed : a.shape=[12,1024,64], b.shape=[12,1024,64], m=1024, n=1024, k=64, batch_size=12
     [[{{node bert/encoder/layer_0/attention/self/MatMul}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 26, in <module>
  File "/content/rgn2/aminobert/prediction.py", line 126, in aminobert_predict_sequence
    aminobert_predict(seqs=seqs, headers=headers, fastas=fastas, checkpoint=checkpoint)
  File "/content/rgn2/aminobert/prediction.py", line 108, in aminobert_predict
    inf_result = run_prediction(seqs, qfunc, checkpoint)
  File "/content/rgn2/aminobert/prediction.py", line 56, in run_prediction
    clip_seq_level_outputs=clip_seq_level_outputs
  File "/content/rgn2/aminobert/run_finetuning_and_prediction.py", line 611, in run_model
    for i,prediction in enumerate(result):
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3078, in predict
    rendezvous.raise_errors()
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 136, in raise_errors
    six.reraise(typ, value, traceback)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/six.py", line 719, in reraise
    raise value
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3072, in predict
    yield_single_examples=yield_single_examples):
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 640, in predict
    preds_evaluated = mon_sess.run(predictions)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1259, in run
    run_metadata=run_metadata)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/six.py", line 719, in reraise
    raise value
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1418, in run
    run_metadata=run_metadata)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas xGEMMBatched launch failed : a.shape=[12,1024,64], b.shape=[12,1024,64], m=1024, n=1024, k=64, batch_size=12
     [[node bert/encoder/layer_0/attention/self/MatMul (defined at /opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
     [[Sum/_1265]]
  (1) Internal: Blas xGEMMBatched launch failed : a.shape=[12,1024,64], b.shape=[12,1024,64], m=1024, n=1024, k=64, batch_size=12
     [[node bert/encoder/layer_0/attention/self/MatMul (defined at /opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'bert/encoder/layer_0/attention/self/MatMul':
  File "<stdin>", line 26, in <module>
  File "/content/rgn2/aminobert/prediction.py", line 126, in aminobert_predict_sequence
    aminobert_predict(seqs=seqs, headers=headers, fastas=fastas, checkpoint=checkpoint)
  File "/content/rgn2/aminobert/prediction.py", line 108, in aminobert_predict
    inf_result = run_prediction(seqs, qfunc, checkpoint)
  File "/content/rgn2/aminobert/prediction.py", line 56, in run_prediction
    clip_seq_level_outputs=clip_seq_level_outputs
  File "/content/rgn2/aminobert/run_finetuning_and_prediction.py", line 611, in run_model
    for i,prediction in enumerate(result):
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3072, in predict
    yield_single_examples=yield_single_examples):
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 622, in predict
    features, None, ModeKeys.PREDICT, self.config)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
    config)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3126, in _model_fn
    features, labels, is_export_mode=is_export_mode)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1663, in call_without_tpu
    return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1994, in _call_model_fn
    estimator_spec = self._model_fn(features=features, **kwargs)
  File "/content/rgn2/aminobert/run_finetuning_and_prediction.py", line 358, in model_fn
    wt_log_prob_mat=wt_log_prob_mat)
  File "/content/rgn2/aminobert/run_finetuning_and_prediction.py", line 219, in create_model
    use_one_hot_embeddings=use_one_hot_embeddings)
  File "/content/rgn2/aminobert/modeling.py", line 219, in __init__
    do_return_all_layers=True)
  File "/content/rgn2/aminobert/modeling.py", line 851, in transformer_model
    to_seq_length=seq_length)
  File "/content/rgn2/aminobert/modeling.py", line 708, in attention_layer
    attention_scores = tf.matmul(query_layer, key_layer, transpose_b=True)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 2716, in matmul
    return batch_mat_mul_fn(a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 1712, in batch_mat_mul_v2
    "BatchMatMulV2", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/opt/conda/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
[<ipython-input-6-be54dd779cfa>](https://localhost:8080/#) in <module>
----> 1 get_ipython().run_cell_magic('bash', '', 'source /opt/conda/etc/profile.d/conda.sh && conda init\nconda activate rgn2\npython\n\nimport os\nimport sys\nimport json\nsys.path.append(os.path.join(os.getcwd(), \'aminobert\'))\n\nimport shutil\nfrom aminobert.prediction import aminobert_predict_sequence\nfrom data_processing.aminobert_postprocessing import aminobert_postprocess\n\nDATA_DIR = \'aminobert_output\'\nDATASET_NAME = \'1\'\nPREPEND_M = True\nAMINOBERT_CHKPT_DIR = \'resources/aminobert_checkpoint/AminoBERT_runs_v2_uniparc_dataset_v2_5-1024_fresh_start_model.ckpt-1100000\'\n\nwith open("run.json", "r") as f:\n    run_inputs = json.load(f)\n\n# Remove old data since AminoBERT combines entire directory to create dataset.\nif os.path.exists(DATA_DIR):\n  shutil.rmtree(DATA_DIR)\nos.makedirs(DATA_DIR)\n\naminobert_predict_sequence(seq=run_inputs[\'sequence\'], header=run_inputs[\'seq_id\'],\n                           prepend_m=PREPEND_M, checkpoint=AMINOBERT_CHKPT_DIR,\n                           data_dir=DATA_DIR)\naminobert_postprocess(data_dir=DATA_DIR, dataset_name=DATASET_NAME, prepend_m=PREPEND_M)\n')

3 frames
<decorator-gen-103> in shebang(self, line, cell)

[/usr/local/lib/python3.7/dist-packages/IPython/core/magics/script.py](https://localhost:8080/#) in shebang(self, line, cell)
    243             sys.stderr.flush()
    244         if args.raise_error and p.returncode!=0:
--> 245             raise CalledProcessError(p.returncode, cell, output=out, stderr=err)
    246 
    247     def _run_script(self, p, cell, to_close):

CalledProcessError: Command 'b'source /opt/conda/etc/profile.d/conda.sh && conda init\nconda activate rgn2\npython\n\nimport os\nimport sys\nimport json\nsys.path.append(os.path.join(os.getcwd(), \'aminobert\'))\n\nimport shutil\nfrom aminobert.prediction import aminobert_predict_sequence\nfrom data_processing.aminobert_postprocessing import aminobert_postprocess\n\nDATA_DIR = \'aminobert_output\'\nDATASET_NAME = \'1\'\nPREPEND_M = True\nAMINOBERT_CHKPT_DIR = \'resources/aminobert_checkpoint/AminoBERT_runs_v2_uniparc_dataset_v2_5-1024_fresh_start_model.ckpt-1100000\'\n\nwith open("run.json", "r") as f:\n    run_inputs = json.load(f)\n\n# Remove old data since AminoBERT combines entire directory to create dataset.\nif os.path.exists(DATA_DIR):\n  shutil.rmtree(DATA_DIR)\nos.makedirs(DATA_DIR)\n\naminobert_predict_sequence(seq=run_inputs[\'sequence\'], header=run_inputs[\'seq_id\'],\n                           prepend_m=PREPEND_M, checkpoint=AMINOBERT_CHKPT_DIR,\n                           data_dir=DATA_DIR)\naminobert_postprocess(data_dir=DATA_DIR, dataset_name=DATASET_NAME, prepend_m=PREPEND_M)\n'' returned non-zero exit status 1.
christinaflo commented 2 years ago

Is it for that same sequence you posted? I was able to get a prediction for it, but I think this is because your device is A100, which does not support the cuda/cuDNN versions this uses. Let me see if there is a way around this, but for now if you can change to a different gpu it should run. The others should be okay, but I am running on T4 for reference.

phiweger commented 1 year ago

Ah ok, yes, going to a lower GPU works, awesome, thank you!