RuntimeError: CUDA error: device-side assert triggered

VictorCallejas commented 4 years ago

Hi everyone,

I am creating a UNITER model for a classification task, but after a few steps of training it launches the error

RuntimeError: CUDA error: device-side assert triggered

TRAINING...
0%
0/15 [00:01<?, ?it/s]
1%
4/563 [00:11<27:16, 2.93s/it]
0
/usr/local/lib/python3.6/dist-packages/apex/amp/_initialize.py:25: UserWarning: An input tensor was not cuda.
  warnings.warn("An input tensor was not cuda.")
1
2
3
4

RuntimeError                              Traceback (most recent call last)
<ipython-input-52-743c7191f227> in <module>()
     19       b_labels = batch['targets']
     20 
---> 21       b_logits = model(batch)
     22 
     23       logits.extend(b_logits)

11 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/apex/amp/_initialize.py in new_fwd(*args, **kwargs)
    195                 def new_fwd(*args, **kwargs):
    196                     output = old_fwd(*applier(args, input_caster),
--> 197                                      **applier(kwargs, input_caster))
    198                     return applier(output, output_caster)
    199                 return new_fwd

<ipython-input-33-52c750f75352> in forward(self, batch, compute_loss)
     34                                       img_feat, img_pos_feat,
     35                                       attn_masks, gather_index,
---> 36                                       output_all_encoded_layers=False)
     37         pooled_output = self.uniter.pooler(sequence_output)
     38         output = self.hateful_memes_output(pooled_output)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

<ipython-input-32-5e573da9f309> in forward(self, input_ids, position_ids, img_feat, img_pos_feat, attention_mask, gather_index, img_masks, output_all_encoded_layers, txt_type_ids, img_type_ids)
    354         encoded_layers = self.encoder(
    355             embedding_output, extended_attention_mask,
--> 356             output_all_encoded_layers=output_all_encoded_layers)
    357         if not output_all_encoded_layers:
    358             encoded_layers = encoded_layers[-1]

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

<ipython-input-32-5e573da9f309> in forward(self, input_, attention_mask, output_all_encoded_layers)
    277         hidden_states = input_
    278         for layer_module in self.layer:
--> 279             hidden_states = layer_module(hidden_states, attention_mask)
    280             if output_all_encoded_layers:
    281                 all_encoder_layers.append(hidden_states)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

<ipython-input-31-529775d24505> in forward(self, hidden_states, attention_mask)
    147     def forward(self, hidden_states, attention_mask):
    148         attention_output = self.attention(hidden_states, attention_mask)
--> 149         intermediate_output = self.intermediate(attention_output)
    150         layer_output = self.output(intermediate_output, attention_output)
    151         return layer_output

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

<ipython-input-31-529775d24505> in forward(self, hidden_states)
    120     def forward(self, hidden_states):
    121         hidden_states = self.dense(hidden_states)
--> 122         hidden_states = self.intermediate_act_fn(hidden_states)
    123         return hidden_states
    124 

<ipython-input-31-529775d24505> in gelu(x)
     16         Also see https://arxiv.org/abs/1606.08415
     17     """
---> 18     return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0)))
     19 
     20 

RuntimeError: CUDA error: device-side assert triggered

Have you encounter this error before?

If I try to make a forward pass before training the outputs are correct

model.train()
outputs = model(d)
torch.sigmoid(outputs).cpu().detach().numpy().tolist()
/usr/local/lib/python3.6/dist-packages/apex/amp/_initialize.py:25: UserWarning: An input tensor was not cuda.
  warnings.warn("An input tensor was not cuda.")
[[0.4417332112789154],
 [0.382718026638031],
 [0.46414244174957275],
 [0.5104507803916931],
 [0.4497249126434326],
 [0.5214864015579224],
 [0.5086051225662231],
 [0.4487886130809784],
 [0.5447408556938171],
 [0.48516517877578735],
 [0.45522886514663696],
 [0.5446500778198242],
 [0.5219737887382507],
 [0.4610774517059326],
 [0.49035000801086426],
 [0.5698526501655579]]

The model

class UniterCls(UniterPreTrainedModel):

    def __init__(self, config, img_dim):
        super().__init__(config)
        self.uniter = UniterModel(config, img_dim)
        self.output = nn.Sequential(
            nn.Linear(config.hidden_size, config.hidden_size*2),
            GELU(),
            LayerNorm(config.hidden_size*2, eps=1e-12),
            nn.Linear(config.hidden_size*2, 1)
        )
        self.apply(self.init_weights)

    def forward(self, batch):
        batch = defaultdict(lambda: None, batch)
        input_ids = batch['input_ids'].to(device)
        position_ids = batch['position_ids'].to(device)
        img_feat = batch['img_feat'].to(device)
        img_pos_feat = batch['img_pos_feat'].to(device)
        attn_masks = batch['attn_masks'].to(device)
        gather_index = batch['gather_index'].to(device)
        sequence_output = self.uniter(input_ids, position_ids,
                                      img_feat, img_pos_feat,
                                      attn_masks, gather_index,
                                      output_all_encoded_layers=False)
        pooled_output = self.uniter.pooler(sequence_output)
        output = self.output(pooled_output)

        return output

ChenRocks commented 4 years ago

Thanks for your question. Please try to provide a minimal reproducing example so that we can identify the issue better. It is unclear whether this error is caused by our code. From my personal experience, this type of error usually appears when you try to input a invalid index to a tensor, for example, the embedding.

VictorCallejas commented 4 years ago

Okay, so some ids from the text tokenizer are greater the the model embedding.

But I am using the same tokenizer and function

from pytorch_pretrained_bert import BertTokenizer

def bert_tokenize(tokenizer, text):
    ids = []
    for word in text.strip().split():
        ws = tokenizer.tokenize(word)
        if not ws:
            # some special char
            continue
        ids.extend(tokenizer.convert_tokens_to_ids(ws))
    return ids

The solution is to modify the vocab size in the config file

For the record, if you have an RuntimeError: CUDA error: device-side assert triggered, it's good to execute the code with CUDA_LAUNCH_BLOCKING=1 python ....

By the way, thanks for making public these incredible models @ChenRocks !

ChenRocks commented 4 years ago

@VictorCallejas which tokenizer did you use? Our vocab size matches that of bert-base-cased and bert-large-cased (the 2 should be identical). If you do not use one of them, the token ids won't be correctly matching the embeddings in UNITER pretrained models.

VictorCallejas commented 4 years ago

My bad, I was using 'bert-base-uncased'

!pip install pytorch_pretrained_bert

from pytorch_pretrained_bert import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

def bert_tokenize(tokenizer, text):
    ids = []
    for word in text.strip().split():
        ws = tokenizer.tokenize(word)
        if not ws:
            # some special char
            continue
        ids.extend(tokenizer.convert_tokens_to_ids(ws))
    return ids

Thank you very much @ChenRocks

ChenRocks / UNITER

RuntimeError: CUDA error: device-side assert triggered #29