ai-starthon / AI_Starthon2019

60 stars 44 forks source link

[문의] team_129/16_tcls_movie/40 submit error #218

Closed baikal-ai closed 5 years ago

baikal-ai commented 5 years ago

submit 에서 에러가 납니다.

root@baikal-dev:/home/hunbl/google-drive/nsml/16_tcls_movie# nsml submit team_129/16_tcls_movie/40 0 ....... Building docker image. It might take for a while .......An error occurred somewhere in your code. You can check error 'nsml submit --test'. Traceback (most recent call last): File "/app/src/main.py", line 5, in nsml.nsml(obj={}, prog_name='nsml') File "/usr/local/lib/python3.6/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/usr/local/lib/python3.6/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.6/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.6/site-packages/click/core.py", line 555, in invoke return callback(args, kwargs) File "/usr/local/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, *kwargs) File "/app/src/nsml_cli/nsml.py", line 637, in submit submit_session(ctx, session, checkpoint, verbose, test, entry, realm) File "/app/src/nsml_cli/sessions/eval_service.py", line 115, in submit_session user_args=None, gpus=DEFAULT_GPU_FOR_INFER File "/app/src/nsml_cli/sessions/eval_service.py", line 472, in _infer raise raised File "/app/src/nsml_cli/sessions/eval_service.py", line 394, in build_and_launch 'exec', session.name, 'ls', '/tmp', encrypted=ctx.obj File "/app/src/nsml_cli/cluster.py", line 93, in docker return docker(self.address, args, kwargs) File "/app/src/nsml_cli/docker.py", line 91, in docker return _run_docker_cmd(args, interactive, err, kwargs) File "/app/src/nsml_cli/docker.py", line 76, in _run_docker_cmd out = check_output(cmd, stderr=-2 if err else devnull) File "/usr/local/lib/python3.6/subprocess.py", line 356, in check_output kwargs).stdout File "/usr/local/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['docker', '--config', '/tmp/team_129', '-H', 'tcp://10.41.69.199:2375', 'exec', '7465616D5F3132395F3138373136643665366535663439323839373663643634373230396134303131', 'ls', '/tmp']' returned non-zero exit status 1. FATA[2019/07/30 17:24:16.989] Internal server error

어떤 부분을 고쳐야 하나요?

Informations

CLI

WEB

What is your NSML login ID?

Question

nsml-admin commented 5 years ago

안녕하세요.

nsml submit team_129/16_tcls_movie/40 0 --test 로 실행하시면 test세션이 새로 생성되고 해당세션의 로그를 확인하시면 nsml logs [session]에러로그를 확인할수 있을것같습니다.

감사합니다

baikal-ai commented 5 years ago

--test를 붙여도 같은 에러로그만 나옵니다

nsml-admin commented 5 years ago

--test붙여서 나온 세션중의 하나가 team_129/16_tcls_movie/46 이 세션인것같은데요

해당세션에 로그를 확인하시면 ( nsml logs team_129/16_tcls_movie/46 ) 에러가 발생한 로그를 볼수있습니다.

에러가 발생한 부분을 수정하셔서 다시 submit하시면 될것같습니다. 참고

baikal-ai commented 5 years ago

--test log를 보니 test시 train 데이터가 없는게 문제였습니다. 관련 부분 수정하여 해결했습니다.