Alibaba-NLP / ACE

[ACL-IJCNLP 2021] Automated Concatenation of Embeddings for Structured Prediction
Other
296 stars 44 forks source link

Segmentation Fault (core dumped) #56

Open akshayklr057 opened 1 year ago

akshayklr057 commented 1 year ago

Hi Team, As suggested by you here to evaluate the model on the CoNLL2003 dataset, I was running the command CUDA_VISIBLE_DEVICES=0 python train.py --config config/conll_03_english.yaml --test to test the working of the code. However, when doing so I get below error:

Screenshot 2023-05-31 at 10 12 29 AM

I had tried debugging it as well but couldn't get a way around this. My system configurations are: Ubuntu: 20.04 RAM: 32GB GPU: NVIDIA GeForce RTX 3080 Ti

wangxinyu0922 commented 1 year ago

I have not met such kinds of problem before. It seems that the problem comes when loading the embeddings. Maybe the CPU memory is not enough.

akshayklr057 commented 1 year ago

After enough digging into the issue, I can see that the issue is because PyTorch is not able to access the CUDA. Also, the recommended PyTorch version (1.3.1) is not listed on the PyTorch website of official releases but is somehow present in the PyPi. This is the snippet where torch fails to put a variable on CUDA:

Screenshot 2023-06-01 at 9 34 11 AM

Moreover, the transformers "from_pretrained" is not able to load the pre-trained models. Thus, throwing "Segmentation fault" issue.

Screenshot 2023-06-01 at 9 35 08 AM

Apart from this, the flair code also threw this error in the "embeddings.py" in the constructor of TransformerWordEmbeddings when calling the parent class transformer. The code was throwing the same error. Attached is the screenshot of the place of code where the issue happened.

Screenshot 2023-06-01 at 9 37 42 AM

Could you tell me which CUDA & Nvidia-Drivers version did you run it with? I was just trying to set up the repository and evaluate to see if the set up was successful.

MintMerlot commented 1 year ago

I have the same problem. My cuda is 11.3, so I update torch 1.11.0, the problem is solved.