Naver-AI-Hackathon / cs492I

2 stars 0 forks source link

Error raised when running baseline #19

Open DonghyunAhn opened 4 years ago

DonghyunAhn commented 4 years ago

Hello, we are kaist_8. We find the issue when running the baseline. Here is the error that we are facing now:

TensorFlow Version 1.13.1

Container image Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
Copyright 2017-2019 The TensorFlow Authors.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

Traceback (most recent call last):
  File "main.py", line 22, in <module>
    import torchvision
  File "/usr/local/lib/python3.5/dist-packages/torchvision/__init__.py", line 3, in <module>
    from torchvision import models
  File "/usr/local/lib/python3.5/dist-packages/torchvision/models/__init__.py", line 12, in <module>
    from . import detection
  File "/usr/local/lib/python3.5/dist-packages/torchvision/models/detection/__init__.py", line 1, in <module>
    from .faster_rcnn import *
  File "/usr/local/lib/python3.5/dist-packages/torchvision/models/detection/faster_rcnn.py", line 7, in <module>
    from torchvision.ops import misc as misc_nn_ops
  File "/usr/local/lib/python3.5/dist-packages/torchvision/ops/__init__.py", line 13, in <module>
    _register_custom_op()
  File "/usr/local/lib/python3.5/dist-packages/torchvision/ops/_register_onnx_ops.py", line 51, in _register_custom_op
    register_custom_op_symbolic('torchvision::_new_empty_tensor_op', new_empty_tensor_op, _onnx_opset_version)
  File "/usr/local/lib/python3.5/dist-packages/torch/onnx/__init__.py", line 200, in register_custom_op_symbolic
    return utils.register_custom_op_symbolic(symbolic_name, symbolic_fn, opset_version)
  File "/usr/local/lib/python3.5/dist-packages/torch/onnx/utils.py", line 793, in register_custom_op_symbolic
    .format(symbolic_name))
RuntimeError: Failed to register operator torchvision::_new_empty_tensor_op.                            The symbolic name must match the format Domain::Name,                            and sould start with a letter and contain only                            alphanumerical characters
User session exited

When I find this error on google, they point out it is the version problem of torch. However, we tried to change it by modifying requirements.txt, but it still not works. I would appreciate any comments to solve this problem.

nsml-admin commented 4 years ago

You can change tensorflow version or docker image by change requirements.txt document

Or I recommend you to change your code following this pr