allenai / longformer

Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
Apache License 2.0
2k stars 268 forks source link

Unable to use longformer in contextualized topic models - CUDA error: device-side assert triggered #233

Open siames3 opened 2 years ago

siames3 commented 2 years ago

Hi, I'm trying to use longformer in contextualized topic models (github page). I replaced the "paraphrase-distilroberta-base-v2" for "allenai/longformer-base-4096" and I'm getting the following errors:

WARNING:root:No sentence-transformers model found with name /home/extern/user/.cache/torch/sentence_transformers/allenai_longformer-base-4096. Creating a new one with MEAN pooling.
Some weights of the model checkpoint at /home/extern/user/.cache/torch/sentence_transformers/allenai_longformer-base-4096 were not used when initializing LongformerModel: ['lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.weight', 'lm_head
.layer_norm.bias', 'lm_head.bias']                                                                                                                      
- This IS expected if you are initializing LongformerModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LongformerModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Batches:   0%|                                                                                                                                                                                                                                                   | 0/1194 [00:00<?, ?it/s]
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [259,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [259,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [259,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [259,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [259,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [259,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [259,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [259,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [259,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [259,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [259,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [259,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
[. . .]
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [389,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [389,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed. 
Batches:   0%|                                                                                                                                                                                                                                                   | 0/1194 [00:03<?, ?it/s]
Traceback (most recent call last):                                                                                                                                                                                                                                                        
  File "ctm/tm_preparation_4096.py", line 52, in <module>                                                                                               
    training_dataset = tp.fit(text_for_contextual=unpreprocessed_corpus, text_for_bow=preprocessed_documents)                                                                                                                            
  File "/home/extern/user/.conda/envs/ctm_env/lib/python3.8/site-packages/contextualized_topic_models/utils/data_preparation.py", line 69, in fit                                                                                    
    train_contextualized_embeddings = bert_embeddings_from_list(text_for_contextual, self.contextualized_model)                                                                                                                                                                           
  File "/home/extern/user/.conda/envs/ctm_env/lib/python3.8/site-packages/contextualized_topic_models/utils/data_preparation.py", line 36, in bert_embeddings_from_list
    return np.array(model.encode(texts, show_progress_bar=True, batch_size=batch_size))                                                                 
  File "/home/extern/user/.conda/envs/ctm_env/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py", line 164, in encode                                        
    out_features = self.forward(features)                                                                                                                                                                                                                                                 
  File "/home/extern/user/.conda/envs/ctm_env/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward                         
    input = module(input)                                                                                                                                                                                                                
  File "/home/extern/user/.conda/envs/ctm_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl                                                                                                      
    return forward_call(*input, **kwargs)                                                                                                                                                                                                                                                 
  File "/home/extern/user/.conda/envs/ctm_env/lib/python3.8/site-packages/sentence_transformers/models/Transformer.py", line 66, in forward            
    output_states = self.auto_model(**trans_features, return_dict=False)                                                                                
  File "/home/extern/user/.conda/envs/ctm_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl                        
    return forward_call(*input, **kwargs)                                                                                                               
  File "/home/extern/user/.conda/envs/ctm_env/lib/python3.8/site-packages/transformers/models/longformer/modeling_longformer.py", line 1703, in forward
    embedding_output = self.embeddings(                                                                                                                 
  File "/home/extern/user/.conda/envs/ctm_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl                        
    return forward_call(*input, **kwargs)                                                                                                               
  File "/home/extern/user/.conda/envs/ctm_env/lib/python3.8/site-packages/transformers/models/longformer/modeling_longformer.py", line 485, in forward 
    position_embeddings = self.position_embeddings(position_ids)                                                                                        
  File "/home/extern/user/.conda/envs/ctm_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl                        
    return forward_call(*input, **kwargs)                                                                                                               
  File "/home/extern/user/.conda/envs/ctm_env/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward                            
    return F.embedding(                                                                                                                                 
  File "/home/extern/user/.conda/envs/ctm_env/lib/python3.8/site-packages/torch/nn/functional.py", line 2183, in embedding                             
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)                                                                      
RuntimeError: CUDA error: device-side assert triggered     

I am in a multi-gpu environment. Am I doing something wrong?

djaym7 commented 8 months ago

Same error, any solution ?