Naver-AI-Hackathon / cs492I

2 stars 0 forks source link

Submitting and Training are rejected due to not enough resources #44

Open qbhan opened 3 years ago

qbhan commented 3 years ago

Hi, I would like to submit our latest model and train another model until the deadline. But it seems that GPU resources are fully allocated(100% according to the nsml), even though my team is training only one simple model using one GPU. So I can't either submit or train the model. But we would like to submit as soon as possible since as the deadline comes short more teams (including my team) will use the GPU resources. How can I at least submit the model? Thank you in advance!

EaststarKim commented 3 years ago

My team is experiencing the same problem for a week, so I just gave up to run multiple sessions. It is totally impossible to try bigger neural net(even Res50 in the baseline), and sometimes I cannot even run one session. We tried to run a session over 700 times(session no.), but we could run only about 20 sessions until now.

Is the goal of this course is learning how to spell "Internal server error", or solving problems with deep learning? XD