OS (Mac, Windows, Linux, etc) and version: Windows10
client version(please show nsml --version):
전체 실행 명령어: nsml submit gazua/ir_ph1_v2/61 19
WEB
Brower (chrome, firfox, etc):
URL:
NSML login ID 가 무엇인가요?
SeoGyuSik
문제가 발생한 세션은 어떤건가요? (bug message or screenshot)
Building docker image. It might take for a while
..........Traceback (most recent call last):
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1127,64,224,224] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node block1_conv1/convolution}} = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](block1_conv1/convolution-0-TransposeNHWCToNCHW-LayoutOptimizer, block1_conv1/kernel/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
..Error: Fail to get prediction result: gazua/ir_ph1_v2/61/19
time="2019/01/15 21:40:15.713" level=fatal msg="Internal server error"
Informations
CLI
WEB
NSML login ID 가 무엇인가요? SeoGyuSik
문제가 발생한 세션은 어떤건가요? (bug message or screenshot) Building docker image. It might take for a while ..........Traceback (most recent call last): tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1127,64,224,224] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node block1_conv1/convolution}} = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](block1_conv1/convolution-0-TransposeNHWCToNCHW-LayoutOptimizer, block1_conv1/kernel/read)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
..Error: Fail to get prediction result: gazua/ir_ph1_v2/61/19 time="2019/01/15 21:40:15.713" level=fatal msg="Internal server error"
재현방법은 어떻게 되나요? `def infer(queries, db):
else:
reference_vecs = get_feature_layer([reference_img, 0])[0]
with open(db_output, 'wb') as f:
pickle.dump(reference_vecs, f)
예상했던 동작방식은 무엇인가요? 배치를 나누면 OOM 오류가 해결될 줄 알았습니다.
제안하고 싶은 해결방법이 있나요? else: 아래부분을 수정하였는데 이제 대회가 끝나가서 submit을 못하고 끝날것 같네요 ㅜㅜ