CUDA out of memory when computing `bleurt` on GEM submission

Hello, I'm trying to compute the bleurt metric on a sample submission for the GEM benchmark (attached). However, running the following command throws a Blas GEMM launch failed error:

gem_metrics sample-submission.json --metric-list bleurt -o metrics.heavy.json

Stack trace

``` [W 220316 15:40:50 texts:191] Model parameter count not present in the submission file. [I 220316 15:40:50 texts:32] Loading predictions for SeqPlan/mlsum_de_validation [I 220316 15:40:50 texts:32] Loading predictions for SeqPlan/mlsum_de_test [I 220316 15:40:50 texts:32] Loading predictions for SeqPlan/mlsum_de_challenge_test_covid [W 220316 15:40:50 data:54] /home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/data/references/mlsum_de_validation.json not found -- downloading https://huggingface.co/datasets/GEM/references/resolve/main/mlsum_de_validation.json. This may take a few minutes. [W 220316 15:40:50 __init__:258] Could not format references for mlsum_de_validation: HTTP Error 404: Not Found File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/gem_metrics/__init__.py", line 251, in load_references dataset_file = ensure_download( File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/gem_metrics/data.py", line 76, in ensure_download urllib.request.urlretrieve( File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 247, in urlretrieve with contextlib.closing(urlopen(url, data)) as fp: File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 531, in open response = meth(req, response) File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 640, in http_response response = self.parent.error( File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 569, in error return self._call_chain(*args) File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 502, in _call_chain result = func(*args) File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) [W 220316 15:40:50 data:54] /home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/data/references/mlsum_de_validation.json not found -- downloading https://huggingface.co/datasets/GEM/references/resolve/main/mlsum_de_validation.json. This may take a few minutes. [I 220316 15:40:50 __init__:275] mlsum_de_validation does not have source associated. [I 220316 15:40:50 texts:32] Loading references for /home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/data/references/mlsum_de_test.json [I 220316 15:40:50 texts:32] Loading sources for /home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/data/references/mlsum_de_test.json [I 220316 15:40:50 __init__:275] mlsum_de_test does not have source associated. [I 220316 15:40:50 texts:32] Loading references for /home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/data/references/mlsum_de_challenge_test_covid.json [I 220316 15:40:51 texts:32] Loading sources for /home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/data/references/mlsum_de_challenge_test_covid.json [I 220316 15:40:51 __init__:275] mlsum_de_challenge_test_covid does not have source associated. [I 220316 15:40:51 __init__:385] Found parent ID in mlsum_de_challenge_test_covid but no corresponding parent dataset [I 220316 15:40:51 __init__:219] Computing metrics for mlsum_de_validation... [I 220316 15:40:51 __init__:219] Computing metrics for mlsum_de_test... [I 220316 15:40:51 __init__:219] Computing metrics for mlsum_de_challenge_test_covid... [I 220316 15:40:51 __init__:152] Computing BLEURT for SeqPlan/mlsum_de_test... [I 220316 15:40:51 __init__:152] Computing BLEURT for SeqPlan/mlsum_de_challenge_test_covid... INFO:tensorflow:Reading checkpoint ../bleurt-base-128. I0316 15:40:58.413195 140619271960384 score.py:161] Reading checkpoint ../bleurt-base-128. INFO:tensorflow:Config file found, reading. I0316 15:40:58.413323 140619271960384 checkpoint.py:92] Config file found, reading. INFO:tensorflow:Will load checkpoint bert_custom I0316 15:40:58.413443 140619271960384 checkpoint.py:96] Will load checkpoint bert_custom INFO:tensorflow:Loads full paths and checks that files exists. I0316 15:40:58.413485 140619271960384 checkpoint.py:98] Loads full paths and checks that files exists. INFO:tensorflow:... name:bert_custom I0316 15:40:58.413520 140619271960384 checkpoint.py:102] ... name:bert_custom INFO:tensorflow:... vocab_file:vocab.txt I0316 15:40:58.413564 140619271960384 checkpoint.py:102] ... vocab_file:vocab.txt INFO:tensorflow:... bert_config_file:bert_config.json I0316 15:40:58.413612 140619271960384 checkpoint.py:102] ... bert_config_file:bert_config.json INFO:tensorflow:... do_lower_case:True I0316 15:40:58.413659 140619271960384 checkpoint.py:102] ... do_lower_case:True INFO:tensorflow:... max_seq_length:128 I0316 15:40:58.413696 140619271960384 checkpoint.py:102] ... max_seq_length:128 INFO:tensorflow:Creating BLEURT scorer. I0316 15:40:58.413734 140619271960384 score.py:168] Creating BLEURT scorer. INFO:tensorflow:Creating WordPiece tokenizer. I0316 15:40:58.413768 140619271960384 tokenizers.py:40] Creating WordPiece tokenizer. INFO:tensorflow:WordPiece tokenizer instantiated. I0316 15:40:58.478093 140619271960384 tokenizers.py:45] WordPiece tokenizer instantiated. INFO:tensorflow:Creating Eager Mode predictor. I0316 15:40:58.478170 140619271960384 score.py:57] Creating Eager Mode predictor. INFO:tensorflow:Loading model. I0316 15:40:58.478209 140619271960384 score.py:62] Loading model. 2022-03-16 15:40:58.843356: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2022-03-16 15:40:58.882447: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:40:58.882741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:08:00.0 name: NVIDIA TITAN RTX computeCapability: 7.5 coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s 2022-03-16 15:40:58.882956: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2022-03-16 15:40:58.884625: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2022-03-16 15:40:58.886231: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2022-03-16 15:40:58.886503: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2022-03-16 15:40:58.887874: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2022-03-16 15:40:58.888352: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2022-03-16 15:40:58.890600: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2022-03-16 15:40:58.890693: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:40:58.890939: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:40:58.891113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2022-03-16 15:40:58.891322: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2022-03-16 15:40:58.896174: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3792765000 Hz 2022-03-16 15:40:58.897041: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe2d0000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2022-03-16 15:40:58.897061: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2022-03-16 15:40:59.000449: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:40:59.000935: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6f9fd30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2022-03-16 15:40:59.000954: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA TITAN RTX, Compute Capability 7.5 2022-03-16 15:40:59.001150: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:40:59.001417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:08:00.0 name: NVIDIA TITAN RTX computeCapability: 7.5 coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s 2022-03-16 15:40:59.001450: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2022-03-16 15:40:59.001463: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2022-03-16 15:40:59.001478: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2022-03-16 15:40:59.001490: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2022-03-16 15:40:59.001500: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2022-03-16 15:40:59.001513: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2022-03-16 15:40:59.001524: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2022-03-16 15:40:59.001609: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:40:59.001901: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:40:59.002133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2022-03-16 15:40:59.002165: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2022-03-16 15:40:59.002925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-03-16 15:40:59.002938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 2022-03-16 15:40:59.002944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N 2022-03-16 15:40:59.003061: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:40:59.003356: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:40:59.003612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22611 MB memory) -> physical GPU (device: 0, name: NVIDIA TITAN RTX, pci bus id: 0000:08:00.0, compute capability: 7.5) WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. W0316 15:40:59.450889 140619271960384 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. INFO:tensorflow:BLEURT initialized. I0316 15:41:00.563959 140619271960384 score.py:174] BLEURT initialized. INFO:tensorflow:Computing BLEURT scores... I0316 15:41:00.564104 140619271960384 score_files.py:133] Computing BLEURT scores... INFO:tensorflow:Reading checkpoint ../bleurt-base-128. I0316 15:41:00.649482 139858446710592 score.py:161] Reading checkpoint ../bleurt-base-128. INFO:tensorflow:Config file found, reading. I0316 15:41:00.649625 139858446710592 checkpoint.py:92] Config file found, reading. INFO:tensorflow:Will load checkpoint bert_custom I0316 15:41:00.649743 139858446710592 checkpoint.py:96] Will load checkpoint bert_custom INFO:tensorflow:Loads full paths and checks that files exists. I0316 15:41:00.649785 139858446710592 checkpoint.py:98] Loads full paths and checks that files exists. INFO:tensorflow:... name:bert_custom I0316 15:41:00.649821 139858446710592 checkpoint.py:102] ... name:bert_custom INFO:tensorflow:... vocab_file:vocab.txt I0316 15:41:00.649855 139858446710592 checkpoint.py:102] ... vocab_file:vocab.txt INFO:tensorflow:... bert_config_file:bert_config.json I0316 15:41:00.649900 139858446710592 checkpoint.py:102] ... bert_config_file:bert_config.json INFO:tensorflow:... do_lower_case:True I0316 15:41:00.649946 139858446710592 checkpoint.py:102] ... do_lower_case:True INFO:tensorflow:... max_seq_length:128 I0316 15:41:00.649982 139858446710592 checkpoint.py:102] ... max_seq_length:128 INFO:tensorflow:Creating BLEURT scorer. I0316 15:41:00.650019 139858446710592 score.py:168] Creating BLEURT scorer. INFO:tensorflow:Creating WordPiece tokenizer. I0316 15:41:00.650053 139858446710592 tokenizers.py:40] Creating WordPiece tokenizer. INFO:tensorflow:WordPiece tokenizer instantiated. I0316 15:41:00.714641 139858446710592 tokenizers.py:45] WordPiece tokenizer instantiated. INFO:tensorflow:Creating Eager Mode predictor. I0316 15:41:00.714712 139858446710592 score.py:57] Creating Eager Mode predictor. INFO:tensorflow:Loading model. I0316 15:41:00.714751 139858446710592 score.py:62] Loading model. 2022-03-16 15:41:01.075217: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2022-03-16 15:41:01.099614: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:41:01.099885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:08:00.0 name: NVIDIA TITAN RTX computeCapability: 7.5 coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s 2022-03-16 15:41:01.100065: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2022-03-16 15:41:01.101612: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2022-03-16 15:41:01.103159: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2022-03-16 15:41:01.103421: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2022-03-16 15:41:01.104980: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2022-03-16 15:41:01.105824: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2022-03-16 15:41:01.107973: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2022-03-16 15:41:01.108065: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:41:01.108303: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:41:01.108475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2022-03-16 15:41:01.108682: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2022-03-16 15:41:01.113914: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3792765000 Hz 2022-03-16 15:41:01.114642: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f31a4000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2022-03-16 15:41:01.114660: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2022-03-16 15:41:01.170013: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:41:01.170263: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x73c46a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2022-03-16 15:41:01.170284: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA TITAN RTX, Compute Capability 7.5 2022-03-16 15:41:01.170470: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:41:01.170703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:08:00.0 name: NVIDIA TITAN RTX computeCapability: 7.5 coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s 2022-03-16 15:41:01.170738: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2022-03-16 15:41:01.170756: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2022-03-16 15:41:01.170774: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2022-03-16 15:41:01.170790: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2022-03-16 15:41:01.170805: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2022-03-16 15:41:01.170816: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2022-03-16 15:41:01.170828: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2022-03-16 15:41:01.170897: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:41:01.171147: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:41:01.171339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2022-03-16 15:41:01.171368: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2022-03-16 15:41:01.172099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-03-16 15:41:01.172110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 2022-03-16 15:41:01.172116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N 2022-03-16 15:41:01.172219: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:41:01.172492: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-16 15:41:01.172725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 923 MB memory) -> physical GPU (device: 0, name: NVIDIA TITAN RTX, pci bus id: 0000:08:00.0, compute capability: 7.5) WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. W0316 15:41:01.458086 139858446710592 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. INFO:tensorflow:BLEURT initialized. I0316 15:41:02.578015 139858446710592 score.py:174] BLEURT initialized. INFO:tensorflow:Computing BLEURT scores... I0316 15:41:02.578154 139858446710592 score_files.py:133] Computing BLEURT scores... 2022-03-16 15:41:10.040301: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2022-03-16 15:41:13.784464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2022-03-16 15:41:13.999250: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2022-03-16 15:41:14.004738: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2022-03-16 15:41:14.006644: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2022-03-16 15:41:14.011551: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2022-03-16 15:41:14.011578: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/app/bleurt/bleurt/score_files.py", line 168, in tf.compat.v1.app.run() File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run _run_main(main, args) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/app/bleurt/bleurt/score_files.py", line 164, in main score_files(sentence_pairs_generator, FLAGS.bleurt_checkpoint) File "/app/bleurt/bleurt/score_files.py", line 138, in score_files _consume_buffer() File "/app/bleurt/bleurt/score_files.py", line 128, in _consume_buffer batch_size=FLAGS.bleurt_batch_size) File "/app/bleurt/bleurt/score.py", line 215, in score predict_out = self._predictor.predict(tf_input) File "/app/bleurt/bleurt/score.py", line 71, in predict input_dict["segment_ids"]))["predictions"].numpy() File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1605, in __call__ return self._call_impl(args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1645, in _call_impl return self._call_flat(args, self.captured_inputs, cancellation_manager) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 598, in call ctx=ctx) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(8192, 2), b.shape=(2, 768), m=8192, n=768, k=2 [[node bert/embeddings/MatMul (defined at app/bleurt/bleurt/score.py:63) ]] [Op:__inference_pruned_6660] Function call stack: pruned ```

As far as I can tell, this error stems from a CUDA OOM error. I'm running on an NVIDIA TITAN RTX with 23.65GiB of memory, so this is quite surprising. One possibility is that the submission file has very long inputs, but these are from one of the baseline models and would presumably be similar for other GEM participants.

For context, I installed the library following the README instructions for "heavy" metrics, plus some additional Docker configuration (login & installing NVIDIA Container toolkit).

cc @sebastianGehrmann @danieldeutsch

sample-submission.json.zip

GEM-benchmark / GEM-metrics

CUDA out of memory when computing `bleurt` on GEM submission #87