ai-starthon / AI_Starthon2019

60 stars 44 forks source link

RuntimeError: CUDA error: an illegal memory access was encountered 가 발생합니다. #229

Open onacloud opened 5 years ago

onacloud commented 5 years ago

Informations

CLI

WEB

What is your NSML login ID?

Question

submit시 아래와 같은 에러를 만나게 됩니다. 에러후에는 submit 리스트에서 자동으로 삭제되고요. 이전에 copy해 두었던 URL로 들어가면 해당 session이 존재합니다. https://nipa.nsml.navercorp.com/overview/team_130/16_tcls_movie/95

--test 옵션으로 실행하면 다음과 같이 정상적으로 값을 출력하는 것을 확인했습니다 .[(0.0, 10), (0.0, 8), (0.0, 10), (0.0, 10), (0.0, 10), (0.0, 8), (0.0, 7), (0.0, 10), (0.0, 7), (0.0, 8)]

에러로그를 부탁드립니다.

(AI2019) C:\Dropbox\ws.py\AI2019\16_tcls_movie>..\nsml submit team_130/16_tcls_movie/88 9
.......
Building docker image. It might take for a while
..........Inference the test set. The inference should be completed within 3600 seconds.
.Error occurred while inference. You can check error 'nsml submit --test'
RuntimeError: CUDA error: an illegal memory access was encountered

..Error: Fail to get prediction result: team_130/16_tcls_movie/88/9
time="2019/07/31 08:20:19.571" level=fatal msg="Internal server error"
nsml-admin commented 5 years ago

안녕하세요,

에러로그는 다음과 같습니다.

Traceback (most recent call last):
  File "main.py", line 87, in infer
    preprocessed_data = tokenizer.sents2arr([twitter.morphs(s) for s in data])
  File "/app/elmo.py", line 245, in sents2arr
    a_arr2 = self.sents2elmo(sents, output_layer)
  File "/app/elmo.py", line 213, in sents2elmo
    output = self.model.forward(w, c, masks)
  File "/app/frontend.py", line 186, in forward
    (mask_package[0].size(0), mask_package[0].size(1)))
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/app/token_embedder.py", line 114, in forward
    convolved = self.convolutions[i](character_embedding)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 187, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: an illegal memory access was encountered