Naver-AI-Hackathon / cs492I

2 stars 0 forks source link

Submission of the model #14

Closed erjui closed 4 years ago

erjui commented 4 years ago

Hi, sorry to ask this kind of simple question. But I'm asking this because I'm having a hard time even running the baseline code. I just ran the baseline code to train the model as follows.

nsml -d fashion_eval -e main.py

And I got the following terminal log as a result.

image

First, I want to ask why the segmentation fault happened even after running the baseline code that I didn't modify... (I tried to fix on my own but couldn't find it, and I think it would be better to ask here because many teams already succeed to run the baseline code already!)

Despite the segmentation fault, since the model itself was successfully saved, I tried to submit with the following command too.

nsml submit -t kaist_9/fashion_eval/2 Res18baseMM_best

But I also ran into the following error....

image

So I want to ask how to solve this question.

Thanks in advance, and sorry for asking this kind of simple question.

nsml-admin commented 4 years ago

'segment related error' is mostly a memory related error.

When you type nsml run, allocate more memory with the --memory option or code it in a way that uses less memory in the code (for example, to reduce the batch size).

erjui commented 4 years ago

'segment related error' is mostly a memory related error.

When you type nsml run, allocate more memory with the --memory option or code it in a way that uses less memory in the code (for example, to reduce the batch size).

Umm.. I am not sure, but I think it's not the problem of running out of memory. For clarifying this. I slightly modified source code that it only runs 1 epoch with batch size 128. And I add a print message after main function ends like follows.

image

image

But the result was segmentation fault again as follows.

image

Since "the program ends" log appeared in the terminal log. I carefully guess my code part itself ran successfully.

After all, model was successfully saved as the message shows though segmentation fault happened. So I also tried to submit the model into the Leaderborad. Then also segmentation fault happened as follows.

image

image

Even testing the result with successfully saved model failed with segmentation fault...

Thanks in advance!

nsml-admin commented 4 years ago

It looks like you have a problem using an older version of Python or docker. Please refer to here and replace the default docker image with nvcr.io/nvidia/pytorch:20.03-py3 and try again.