cudaErrorIllegalAddress occurs when using TTEmbeddingBag and nn.EmbeddingBag at same time

fumihwh commented 3 years ago

As title says, a cudaErrorIllegalAddress occurs when using TTEmbeddingBag and nn.EmbeddingBag at same time. I'v added self.cache_populate() to TableBatchedTTEmbeddingBag forward method just after self.update_cache(indices).

    def forward(
        self, indices: torch.Tensor, offsets: torch.Tensor, warmup: bool = True
    ) -> torch.Tensor:
        (indices, offsets) = indices.long(), offsets.long()

        # update hash table and lfu state
        self.update_cache(indices)
        self.cache_populate()

Cases	nn.EmbCls	Error
`self.cache_populate()` in `TableBatchedTTEmbeddingBag` forward	with `nn.EmbeddingBag`	`cudaErrorIllegalAddress`
Call `cache_populate` after backward done	with `nn.EmbeddingBag`	`RuntimeError: CUDA error: invalid device ordinal`
`self.cache_populate()` in `TableBatchedTTEmbeddingBag` forward	without `nn.EmbeddingBag`
Call `cache_populate` after backward done	without `nn.EmbeddingBag`
`self.cache_populate()` in `TableBatchedTTEmbeddingBag` forward	with `nn.Embedding`
Call `cache_populate` after backward done	with `nn.Embedding`

Snippets

import torch
from torch import nn
import numpy as np
from tt_embeddings_ops import TTEmbeddingBag, OptimType
import torch.nn.functional as F

vocabulary_size = 1000
embedding_dim = 4
TT_RANK = 8
NUM_TT_CORES = 3
tt_ranks = [TT_RANK] * (NUM_TT_CORES - 1)
batch_size = 100
device = 0
use_cache = True
cache_size = vocabulary_size

use_nn_emb = True

class MyModel(nn.Module):

  def __init__(self):
    super(MyModel, self).__init__()
    self.emb1 = TTEmbeddingBag(
        vocabulary_size,
        embedding_dim,
        tt_ranks,
        None,  # tt_p_shapes,
        None,  # tt_q_shapes,
        OptimType.EXACT_ADAGRAD,
        sparse=True,
        use_cache=use_cache,
        cache_size=cache_size,
        learning_rate=0.01,
    ).to(device)
    self.emb2 = TTEmbeddingBag(
        vocabulary_size,
        embedding_dim,
        tt_ranks,
        None,  # tt_p_shapes,
        None,  # tt_q_shapes,
        OptimType.EXACT_ADAGRAD,
        sparse=True,
        use_cache=use_cache,
        cache_size=cache_size,
        learning_rate=0.01,
    ).to(device)
    self.emb3 = nn.EmbeddingBag(vocabulary_size, embedding_dim,
                                mode="sum").to(device)

    self.l = nn.Linear(embedding_dim * (3 if use_nn_emb else 2), 5).to(device)

  def forward(self, x, offsets):
    offsets_ori = torch.tensor(np.array(range((x.shape[0]))).astype(np.int64),
                               device=device).flatten()
    rs = [
        self.emb1.forward(x[:, 0], offsets),
        self.emb2.forward(x[:, 1], offsets)
    ]
    if use_nn_emb:
      rs.append(self.emb3.forward(x[:, 2], offsets_ori))
    return self.l(torch.cat(rs, dim=1))

model = MyModel()
model.train()
for e in range(10):
  grad_output = torch.rand(batch_size, 5, device=device) * 0.1
  x = torch.randint(0,
                    vocabulary_size - 10, (batch_size, 3 if use_nn_emb else 2),
                    device=device)
  offsets = torch.tensor(np.array(range((x.shape[0] + 1))).astype(np.int64),
                         device=device).flatten()
  y = model(x, offsets)
  y.backward(grad_output)

Env

I use docker image pytorch/pytorch:1.9.0-cuda11.1-cudnn8-devel.

PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.10

Python version: 3.7 (64-bit runtime)
Python platform: Linux-4.19.95-17-x86_64-with-debian-buster-sid
Is CUDA available: True
CUDA runtime version: 11.1.105
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 460.27.04
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.9.0
[pip3] torchelastic==0.2.0
[pip3] torchtext==0.10.0
[pip3] torchvision==0.10.0
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.1.74              h6bb024c_0    nvidia
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.2.0           h06a4308_296
[conda] mkl-service               2.3.0            py37h27cfd23_1
[conda] mkl_fft                   1.3.0            py37h42c9631_2
[conda] mkl_random                1.2.1            py37ha9443f7_2
[conda] numpy                     1.19.5                   pypi_0    pypi
[conda] pytorch                   1.9.0           py3.7_cuda11.1_cudnn8.0.5_0    pytorch
[conda] torchelastic              0.2.0                    pypi_0    pypi
[conda] torchtext                 0.10.0                     py37    pytorch
[conda] torchvision               0.10.0               py37_cu111    pytorch

fumihwh commented 3 years ago

~UPDATE: Should call cache_populate after backward.~

fumihwh commented 3 years ago

If I call cache_populate after backward, RuntimeError: CUDA error: invalid device ordinal was occurred.

fumihwh commented 3 years ago

Should use same sparse param in TTEmbeddingBag and nn.EmbeddingBag

facebookresearch / FBTT-Embedding

cudaErrorIllegalAddress occurs when using TTEmbeddingBag and nn.EmbeddingBag at same time #16

Snippets

Env