Error when running examples

Bowenwu1 commented 5 years ago

When I try to run one of your examples https://github.com/NervanaSystems/distiller/blob/master/examples/pruning_filters_for_efficient_convnets/resnet56_cifar_filter_rank_v2.yaml I encountered following error:

My cuda and cudnn: My GPU and nvidia driver version:

Looking forward for your reply, thank you!

nzmora commented 5 years ago

Hi @Bowenwu1,

I am not able to reproduce this problem locally so I need more information from you. The first few 10s of lines in the log file contain information about your environment (incl. various SW package versions). Please post this information.

Another thing you can do is go to distiller/tests and invoke pytest:

distiller/tests$ pytest

This invokes some unit-tests and will help me understand the state of your code (the tests take a few minutes to run). They should all pass.

Thanks Neta

Bowenwu1 commented 5 years ago

First 10s of lines in the log file

2019-05-24 11:23:03,170 - Log file for this run: /home/dm/Documents/pytorch-cifar/distiller/examples/classifier_compression/logs/2019.05.24-112303/2019.05.24-112303.log
2019-05-24 11:23:03,170 - Number of CPUs: 12
2019-05-24 11:23:03,198 - Number of GPUs: 1
2019-05-24 11:23:03,198 - CUDA version: 9.0.176
2019-05-24 11:23:03,198 - CUDNN version: 7402
2019-05-24 11:23:03,198 - Kernel: 4.15.0-50-generic
2019-05-24 11:23:03,198 - Python: 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0]
2019-05-24 11:23:03,198 - PyTorch: 1.0.1
2019-05-24 11:23:03,198 - Numpy: 1.16.3
2019-05-24 11:23:03,221 - Git is dirty
2019-05-24 11:23:03,222 - Active Git branch: master
2019-05-24 11:23:03,232 - Git commit: 1f48fa64131596b181ebd26a59d2679f7f877dee
2019-05-24 11:23:03,232 - Command line: compress_classifier.py -a=resnet56_cifar -p=50 ../../../data.cifar10 --epochs=70 --lr=0.1 --compress=../pruning_filters_for_efficient_convnets/resnet56_cifar_filter_rank_v2.yaml --resume-from=checkpoint.resnet56_cifar_baseline.pth.tar --reset-optimizer --vs=0
2019-05-24 11:23:03,252 - => creating resnet56_cifar model for CIFAR10
2019-05-24 11:23:05,674 - => loading checkpoint checkpoint.resnet56_cifar_baseline.pth.tar
2019-05-24 11:23:05,685 - => Checkpoint contents:
╒═══════════════════╤═════════════╤════════════════╕
│ Key               │ Type        │ Value          │
╞═══════════════════╪═════════════╪════════════════╡
│ arch              │ str         │ resnet56_cifar │
├───────────────────┼─────────────┼────────────────┤
│ best_top1         │ float       │ 92.92          │
├───────────────────┼─────────────┼────────────────┤
│ compression_sched │ dict        │                │
├───────────────────┼─────────────┼────────────────┤
│ epoch             │ int         │ 179            │
├───────────────────┼─────────────┼────────────────┤
│ optimizer         │ dict        │                │
├───────────────────┼─────────────┼────────────────┤
│ state_dict        │ OrderedDict │                │
╘═══════════════════╧═════════════╧════════════════╛

2019-05-24 11:23:05,686 - Loaded compression schedule from checkpoint (epoch 179)
2019-05-24 11:23:05,700 - Optimizer could not be loaded from checkpoint.
2019-05-24 11:23:05,700 - => loaded checkpoint 'checkpoint.resnet56_cifar_baseline.pth.tar' (epoch 179)
2019-05-24 11:23:05,701 - Optimizer Type: <class 'torch.optim.sgd.SGD'>
2019-05-24 11:23:05,701 - Optimizer Args: {'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0.0001, 'nesterov': False}
2019-05-24 11:23:06,869 - Dataset sizes:
    training=50000
    validation=10000
    test=10000
2019-05-24 11:23:06,870 - Reading compression schedule from: ../pruning_filters_for_efficient_convnets/resnet56_cifar_filter_rank_v2.yaml

output of pytest:

=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.7.3, pytest-3.5.1, py-1.8.0, pluggy-0.6.0
rootdir: /home/dm/Documents/pytorch-cifar/distiller, inifile:
collected 122 items                                                                                                                                                                                        

test_basic.py ...                                                                                                                                                                                    [  2%]
test_infra.py ..F.........                                                                                                                                                                           [ 12%]
test_learning_rate.py .                                                                                                                                                                              [ 13%]
test_loss.py ..                                                                                                                                                                                      [ 14%]
test_lstm_impl.py ..............                                                                                                                                                                     [ 26%]
test_model_summary.py F...                                                                                                                                                                           [ 29%]
test_post_train_quant.py ........................                                                                                                                                                    [ 49%]
test_pruning.py .............                                                                                                                                                                        [ 59%]
test_quant_utils.py .......                                                                                                                                                                          [ 65%]
test_quantizer.py ............................                                                                                                                                                       [ 88%]
test_ranking.py ..                                                                                                                                                                                   [ 90%]
test_summarygraph.py ..........                                                                                                                                                                      [ 98%]
test_thresholding.py ..                                                                                                                                                                              [100%]

================================================================================================= FAILURES =================================================================================================
____________________________________________________________________________________ test_create_model_pretrainedmodels ____________________________________________________________________________________

    def test_create_model_pretrainedmodels():
        premodel_name = 'resnext101_32x4d'
>       model = create_model(True, 'imagenet', premodel_name)

test_infra.py:53: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../distiller/models/__init__.py:74: in create_model
    pretrained=(dataset if pretrained else None))
../../cifar/lib/python3.7/site-packages/pretrainedmodels/models/resnext.py:85: in resnext101_32x4d
    model.load_state_dict(model_zoo.load_url(settings['url']))
../../cifar/lib/python3.7/site-packages/torch/utils/model_zoo.py:66: in load_url
    _download_url_to_file(url, cached_file, hash_prefix, progress=progress)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

url = 'http://data.lip6.fr/cadene/pretrainedmodels/resnext101_32x4d-29e315fa.pth', dst = '/home/dm/.torch/models/resnext101_32x4d-29e315fa.pth', hash_prefix = '29e315fa', progress = True

    def _download_url_to_file(url, dst, hash_prefix, progress):
        file_size = None
        if requests_available:
            u = urlopen(url, stream=True)
            if hasattr(u.headers, "Content-Length"):
                file_size = int(u.headers["Content-Length"])
            u = u.raw
        else:
            u = urlopen(url)
            meta = u.info()
            if hasattr(meta, 'getheaders'):
                content_length = meta.getheaders("Content-Length")
            else:
                content_length = meta.get_all("Content-Length")
            if content_length is not None and len(content_length) > 0:
                file_size = int(content_length[0])

        f = tempfile.NamedTemporaryFile(delete=False)
        try:
            if hash_prefix is not None:
                sha256 = hashlib.sha256()
            with tqdm(total=file_size, disable=not progress) as pbar:
                while True:
                    buffer = u.read(8192)
                    if len(buffer) == 0:
                        break
                    f.write(buffer)
                    if hash_prefix is not None:
                        sha256.update(buffer)
                    pbar.update(len(buffer))

            f.close()
            if hash_prefix is not None:
                digest = sha256.hexdigest()
                if digest[:len(hash_prefix)] != hash_prefix:
                    raise RuntimeError('invalid hash value (expected "{}", got "{}")'
>                                      .format(hash_prefix, digest))
E                                      RuntimeError: invalid hash value (expected "29e315fa", got "2be79b781dde89a798dddaa2f9d1a865b50ae6f714e87ae445f309acb7f91b27")

../../cifar/lib/python3.7/site-packages/torch/utils/model_zoo.py:106: RuntimeError
------------------------------------------------------------------------------------------- Captured stderr call -------------------------------------------------------------------------------------------
Downloading: "http://data.lip6.fr/cadene/pretrainedmodels/resnext101_32x4d-29e315fa.pth" to /home/dm/.torch/models/resnext101_32x4d-29e315fa.pth
19681944it [02:45, 118887.36it/s]
___________________________________________________________________________________________ test_png_generation ____________________________________________________________________________________________

self = <pydot.Dot object at 0x7f4a704f2550>, prog = 'dot', format = 'png', encoding = None

    def create(self, prog=None, format='ps', encoding=None):
        """Creates and returns a binary image for the graph.

            create will write the graph to a temporary dot file in the
            encoding specified by `encoding` and process it with the
            program given by 'prog' (which defaults to 'twopi'), reading
            the binary image output and return it as:

            - `str` of bytes in Python 2
            - `bytes` in Python 3

            There's also the preferred possibility of using:

                create_'format'(prog='program')

            which are automatically defined for all the supported formats,
            for example:

              - `create_ps()`
              - `create_gif()`
              - `create_dia()`

            If 'prog' is a list, instead of a string,
            then the fist item is expected to be the program name,
            followed by any optional command-line arguments for it:

                [ 'twopi', '-Tdot', '-s10' ]

            @param prog: either:

              - name of GraphViz executable that
                can be found in the `$PATH`, or

              - absolute path to GraphViz executable.

              If you have added GraphViz to the `$PATH` and
              use its executables as installed
              (without renaming any of them)
              then their names are:

                - `'dot'`
                - `'twopi'`
                - `'neato'`
                - `'circo'`
                - `'fdp'`
                - `'sfdp'`

              On Windows, these have the notorious ".exe" extension that,
              only for the above strings, will be added automatically.

              The `$PATH` is inherited from `os.env['PATH']` and
              passed to `subprocess.Popen` using the `env` argument.

              If you haven't added GraphViz to your `$PATH` on Windows,
              then you may want to give the absolute path to the
              executable (for example, to `dot.exe`) in `prog`.
            """
        default_names = set([
            'dot', 'twopi', 'neato',
            'circo', 'fdp', 'sfdp'])
        if prog is None:
            prog = self.prog
        assert prog is not None
        if isinstance(prog, (list, tuple)):
            prog, args = prog[0], prog[1:]
        else:
            args = []
        if os.name == 'nt' and prog in default_names:
            if not prog.endswith('.exe'):
                prog += '.exe'
        # temp file
        tmp_fd, tmp_name = tempfile.mkstemp()
        os.close(tmp_fd)
        self.write(tmp_name, encoding=encoding)
        tmp_dir = os.path.dirname(tmp_name)
        # For each of the image files...
        for img in self.shape_files:
            # Get its data
            f = open(img, 'rb')
            f_data = f.read()
            f.close()
            # And copy it under a file with the same name in
            # the temporary directory
            f = open(os.path.join(tmp_dir, os.path.basename(img)), 'wb')
            f.write(f_data)
            f.close()
        # explicitly inherit `$PATH`, on Windows too,
        # with `shell=False`
        env = dict()
        env['PATH'] = os.environ.get('PATH', '')
        env['LD_LIBRARY_PATH'] = os.environ.get('LD_LIBRARY_PATH', '')
        cmdline = [prog, '-T' + format] + args + [tmp_name]
        try:
            p = subprocess.Popen(
                cmdline,
                env=env,
                cwd=tmp_dir,
                shell=False,
>               stderr=subprocess.PIPE, stdout=subprocess.PIPE)

../../cifar/lib/python3.7/site-packages/pydot.py:1861: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <subprocess.Popen object at 0x7f4a5c5c9908>, args = ['dot', '-Tpng', '/tmp/tmp36lz76vx'], bufsize = -1, executable = None, stdin = None, stdout = -1, stderr = -1, preexec_fn = None
close_fds = True, shell = False, cwd = '/tmp'
env = {'LD_LIBRARY_PATH': '/usr/local/cuda-9.0/lib64', 'PATH': '/home/dm/Documents/pytorch-cifar/cifar/bin:/usr/local/cuda-9...home/dm/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin'}
universal_newlines = None, startupinfo = None, creationflags = 0, restore_signals = True, start_new_session = False, pass_fds = ()

    def __init__(self, args, bufsize=-1, executable=None,
                 stdin=None, stdout=None, stderr=None,
                 preexec_fn=None, close_fds=True,
                 shell=False, cwd=None, env=None, universal_newlines=None,
                 startupinfo=None, creationflags=0,
                 restore_signals=True, start_new_session=False,
                 pass_fds=(), *, encoding=None, errors=None, text=None):
        """Create new Popen instance."""
        _cleanup()
        # Held while anything is calling waitpid before returncode has been
        # updated to prevent clobbering returncode if wait() or poll() are
        # called from multiple threads at once.  After acquiring the lock,
        # code must re-check self.returncode to see if another thread just
        # finished a waitpid() call.
        self._waitpid_lock = threading.Lock()

        self._input = None
        self._communication_started = False
        if bufsize is None:
            bufsize = -1  # Restore default
        if not isinstance(bufsize, int):
            raise TypeError("bufsize must be an integer")

        if _mswindows:
            if preexec_fn is not None:
                raise ValueError("preexec_fn is not supported on Windows "
                                 "platforms")
        else:
            # POSIX
            if pass_fds and not close_fds:
                warnings.warn("pass_fds overriding close_fds.", RuntimeWarning)
                close_fds = True
            if startupinfo is not None:
                raise ValueError("startupinfo is only supported on Windows "
                                 "platforms")
            if creationflags != 0:
                raise ValueError("creationflags is only supported on Windows "
                                 "platforms")

        self.args = args
        self.stdin = None
        self.stdout = None
        self.stderr = None
        self.pid = None
        self.returncode = None
        self.encoding = encoding
        self.errors = errors

        # Validate the combinations of text and universal_newlines
        if (text is not None and universal_newlines is not None
            and bool(universal_newlines) != bool(text)):
            raise SubprocessError('Cannot disambiguate when both text '
                                  'and universal_newlines are supplied but '
                                  'different. Pass one or the other.')

        # Input and output objects. The general principle is like
        # this:
        #
        # Parent                   Child
        # ------                   -----
        # p2cwrite   ---stdin--->  p2cread
        # c2pread    <--stdout---  c2pwrite
        # errread    <--stderr---  errwrite
        #
        # On POSIX, the child objects are file descriptors.  On
        # Windows, these are Windows file handles.  The parent objects
        # are file descriptors on both platforms.  The parent objects
        # are -1 when not using PIPEs. The child objects are -1
        # when not redirecting.

        (p2cread, p2cwrite,
         c2pread, c2pwrite,
         errread, errwrite) = self._get_handles(stdin, stdout, stderr)

        # We wrap OS handles *before* launching the child, otherwise a
        # quickly terminating child could make our fds unwrappable
        # (see #8458).

        if _mswindows:
            if p2cwrite != -1:
                p2cwrite = msvcrt.open_osfhandle(p2cwrite.Detach(), 0)
            if c2pread != -1:
                c2pread = msvcrt.open_osfhandle(c2pread.Detach(), 0)
            if errread != -1:
                errread = msvcrt.open_osfhandle(errread.Detach(), 0)

        self.text_mode = encoding or errors or text or universal_newlines

        # How long to resume waiting on a child after the first ^C.
        # There is no right value for this.  The purpose is to be polite
        # yet remain good for interactive users trying to exit a tool.
        self._sigint_wait_secs = 0.25  # 1/xkcd221.getRandomNumber()

        self._closed_child_pipe_fds = False

        try:
            if p2cwrite != -1:
                self.stdin = io.open(p2cwrite, 'wb', bufsize)
                if self.text_mode:
                    self.stdin = io.TextIOWrapper(self.stdin, write_through=True,
                            line_buffering=(bufsize == 1),
                            encoding=encoding, errors=errors)
            if c2pread != -1:
                self.stdout = io.open(c2pread, 'rb', bufsize)
                if self.text_mode:
                    self.stdout = io.TextIOWrapper(self.stdout,
                            encoding=encoding, errors=errors)
            if errread != -1:
                self.stderr = io.open(errread, 'rb', bufsize)
                if self.text_mode:
                    self.stderr = io.TextIOWrapper(self.stderr,
                            encoding=encoding, errors=errors)

            self._execute_child(args, executable, preexec_fn, close_fds,
                                pass_fds, cwd, env,
                                startupinfo, creationflags, shell,
                                p2cread, p2cwrite,
                                c2pread, c2pwrite,
                                errread, errwrite,
>                               restore_signals, start_new_session)

/home/dm/miniconda3/lib/python3.7/subprocess.py:775: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <subprocess.Popen object at 0x7f4a5c5c9908>, args = ['dot', '-Tpng', '/tmp/tmp36lz76vx'], executable = b'dot', preexec_fn = None, close_fds = True, pass_fds = (), cwd = '/tmp'
env = {'LD_LIBRARY_PATH': '/usr/local/cuda-9.0/lib64', 'PATH': '/home/dm/Documents/pytorch-cifar/cifar/bin:/usr/local/cuda-9...home/dm/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin'}
startupinfo = None, creationflags = 0, shell = False, p2cread = -1, p2cwrite = -1, c2pread = 35, c2pwrite = 37, errread = 38, errwrite = 39, restore_signals = True, start_new_session = False

    def _execute_child(self, args, executable, preexec_fn, close_fds,
                       pass_fds, cwd, env,
                       startupinfo, creationflags, shell,
                       p2cread, p2cwrite,
                       c2pread, c2pwrite,
                       errread, errwrite,
                       restore_signals, start_new_session):
        """Execute program (POSIX version)"""

        if isinstance(args, (str, bytes)):
            args = [args]
        else:
            args = list(args)

        if shell:
            # On Android the default shell is at '/system/bin/sh'.
            unix_shell = ('/system/bin/sh' if
                      hasattr(sys, 'getandroidapilevel') else '/bin/sh')
            args = [unix_shell, "-c"] + args
            if executable:
                args[0] = executable

        if executable is None:
            executable = args[0]
        orig_executable = executable

        # For transferring possible exec failure from child to parent.
        # Data format: "exception name:hex errno:description"
        # Pickle is not used; it is complex and involves memory allocation.
        errpipe_read, errpipe_write = os.pipe()
        # errpipe_write must not be in the standard io 0, 1, or 2 fd range.
        low_fds_to_close = []
        while errpipe_write < 3:
            low_fds_to_close.append(errpipe_write)
            errpipe_write = os.dup(errpipe_write)
        for low_fd in low_fds_to_close:
            os.close(low_fd)
        try:
            try:
                # We must avoid complex work that could involve
                # malloc or free in the child process to avoid
                # potential deadlocks, thus we do all this here.
                # and pass it to fork_exec()

                if env is not None:
                    env_list = []
                    for k, v in env.items():
                        k = os.fsencode(k)
                        if b'=' in k:
                            raise ValueError("illegal environment variable name")
                        env_list.append(k + b'=' + os.fsencode(v))
                else:
                    env_list = None  # Use execv instead of execve.
                executable = os.fsencode(executable)
                if os.path.dirname(executable):
                    executable_list = (executable,)
                else:
                    # This matches the behavior of os._execvpe().
                    executable_list = tuple(
                        os.path.join(os.fsencode(dir), executable)
                        for dir in os.get_exec_path(env))
                fds_to_keep = set(pass_fds)
                fds_to_keep.add(errpipe_write)
                self.pid = _posixsubprocess.fork_exec(
                        args, executable_list,
                        close_fds, tuple(sorted(map(int, fds_to_keep))),
                        cwd, env_list,
                        p2cread, p2cwrite, c2pread, c2pwrite,
                        errread, errwrite,
                        errpipe_read, errpipe_write,
                        restore_signals, start_new_session, preexec_fn)
                self._child_created = True
            finally:
                # be sure the FD is closed no matter what
                os.close(errpipe_write)

            # self._devnull is not always defined.
            devnull_fd = getattr(self, '_devnull', None)
            if p2cread != -1 and p2cwrite != -1 and p2cread != devnull_fd:
                os.close(p2cread)
            if c2pwrite != -1 and c2pread != -1 and c2pwrite != devnull_fd:
                os.close(c2pwrite)
            if errwrite != -1 and errread != -1 and errwrite != devnull_fd:
                os.close(errwrite)
            if devnull_fd is not None:
                os.close(devnull_fd)
            # Prevent a double close of these fds from __init__ on error.
            self._closed_child_pipe_fds = True

            # Wait for exec to fail or succeed; possibly raising an
            # exception (limited in size)
            errpipe_data = bytearray()
            while True:
                part = os.read(errpipe_read, 50000)
                errpipe_data += part
                if not part or len(errpipe_data) > 50000:
                    break
        finally:
            # be sure the FD is closed no matter what
            os.close(errpipe_read)

        if errpipe_data:
            try:
                pid, sts = os.waitpid(self.pid, 0)
                if pid == self.pid:
                    self._handle_exitstatus(sts)
                else:
                    self.returncode = sys.maxsize
            except ChildProcessError:
                pass

            try:
                exception_name, hex_errno, err_msg = (
                        errpipe_data.split(b':', 2))
                # The encoding here should match the encoding
                # written in by the subprocess implementations
                # like _posixsubprocess
                err_msg = err_msg.decode()
            except ValueError:
                exception_name = b'SubprocessError'
                hex_errno = b'0'
                err_msg = 'Bad exception data from child: {!r}'.format(
                              bytes(errpipe_data))
            child_exception_type = getattr(
                    builtins, exception_name.decode('ascii'),
                    SubprocessError)
            if issubclass(child_exception_type, OSError) and hex_errno:
                errno_num = int(hex_errno, 16)
                child_exec_never_called = (err_msg == "noexec")
                if child_exec_never_called:
                    err_msg = ""
                    # The error must be from chdir(cwd).
                    err_filename = cwd
                else:
                    err_filename = orig_executable
                if errno_num != 0:
                    err_msg = os.strerror(errno_num)
                    if errno_num == errno.ENOENT:
                        err_msg += ': ' + repr(err_filename)
>               raise child_exception_type(errno_num, err_msg, err_filename)
E               FileNotFoundError: [Errno 2] No such file or directory: 'dot': 'dot'

/home/dm/miniconda3/lib/python3.7/subprocess.py:1522: FileNotFoundError

During handling of the above exception, another exception occurred:

    def test_png_generation():
        dataset = "cifar10"
        arch = "resnet20_cifar"
        model, _ = common.setup_test(arch, dataset, parallel=True)
        # 2 different ways to create a PNG
>       distiller.draw_img_classifier_to_file(model, 'model.png', dataset, True)

test_model_summary.py:35: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../distiller/model_summaries.py:439: in draw_img_classifier_to_file
    draw_model_to_file(g, png_fname, display_param_nodes, rankdir, styles)
../distiller/model_summaries.py:411: in draw_model_to_file
    png = create_png(sgraph, display_param_nodes=display_param_nodes)
../distiller/model_summaries.py:391: in create_png
    png = pydot_graph.create_png()
../../cifar/lib/python3.7/site-packages/pydot.py:1662: in new_method
    format=f, prog=prog, encoding=encoding)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pydot.Dot object at 0x7f4a704f2550>, prog = 'dot', format = 'png', encoding = None

    def create(self, prog=None, format='ps', encoding=None):
        """Creates and returns a binary image for the graph.

            create will write the graph to a temporary dot file in the
            encoding specified by `encoding` and process it with the
            program given by 'prog' (which defaults to 'twopi'), reading
            the binary image output and return it as:

            - `str` of bytes in Python 2
            - `bytes` in Python 3

            There's also the preferred possibility of using:

                create_'format'(prog='program')

            which are automatically defined for all the supported formats,
            for example:

              - `create_ps()`
              - `create_gif()`
              - `create_dia()`

            If 'prog' is a list, instead of a string,
            then the fist item is expected to be the program name,
            followed by any optional command-line arguments for it:

                [ 'twopi', '-Tdot', '-s10' ]

            @param prog: either:

              - name of GraphViz executable that
                can be found in the `$PATH`, or

              - absolute path to GraphViz executable.

              If you have added GraphViz to the `$PATH` and
              use its executables as installed
              (without renaming any of them)
              then their names are:

                - `'dot'`
                - `'twopi'`
                - `'neato'`
                - `'circo'`
                - `'fdp'`
                - `'sfdp'`

              On Windows, these have the notorious ".exe" extension that,
              only for the above strings, will be added automatically.

              The `$PATH` is inherited from `os.env['PATH']` and
              passed to `subprocess.Popen` using the `env` argument.

              If you haven't added GraphViz to your `$PATH` on Windows,
              then you may want to give the absolute path to the
              executable (for example, to `dot.exe`) in `prog`.
            """
        default_names = set([
            'dot', 'twopi', 'neato',
            'circo', 'fdp', 'sfdp'])
        if prog is None:
            prog = self.prog
        assert prog is not None
        if isinstance(prog, (list, tuple)):
            prog, args = prog[0], prog[1:]
        else:
            args = []
        if os.name == 'nt' and prog in default_names:
            if not prog.endswith('.exe'):
                prog += '.exe'
        # temp file
        tmp_fd, tmp_name = tempfile.mkstemp()
        os.close(tmp_fd)
        self.write(tmp_name, encoding=encoding)
        tmp_dir = os.path.dirname(tmp_name)
        # For each of the image files...
        for img in self.shape_files:
            # Get its data
            f = open(img, 'rb')
            f_data = f.read()
            f.close()
            # And copy it under a file with the same name in
            # the temporary directory
            f = open(os.path.join(tmp_dir, os.path.basename(img)), 'wb')
            f.write(f_data)
            f.close()
        # explicitly inherit `$PATH`, on Windows too,
        # with `shell=False`
        env = dict()
        env['PATH'] = os.environ.get('PATH', '')
        env['LD_LIBRARY_PATH'] = os.environ.get('LD_LIBRARY_PATH', '')
        cmdline = [prog, '-T' + format] + args + [tmp_name]
        try:
            p = subprocess.Popen(
                cmdline,
                env=env,
                cwd=tmp_dir,
                shell=False,
                stderr=subprocess.PIPE, stdout=subprocess.PIPE)
        except OSError as e:
>           if e.errno == os.errno.ENOENT:
E           AttributeError: module 'os' has no attribute 'errno'

../../cifar/lib/python3.7/site-packages/pydot.py:1863: AttributeError
------------------------------------------------------------------------------------------- Captured stderr call -------------------------------------------------------------------------------------------
INFO:root:=> creating resnet20_cifar model for CIFAR10
-------------------------------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------------------------------
__init__.py                 92 INFO     => creating resnet20_cifar model for CIFAR10
================================================================================== 2 failed, 120 passed in 259.18 seconds ==================================================================================

nzmora commented 5 years ago

Hi @Bowenwu1, I see 3 seemingly unrelated errors: the original problem, and the 2 failures in the unit-tests (one in pydot; the other is due to a wrong hash of a downloaded model). I don't see any connection between these errors.

The log files tells me you are at commit 1f48fa64131596b181ebd26a59d2679f7f877dee with local changes (Git is dirty). I moved my git history to this commit and ran the unit tests w/o any problems.

The only two clues I found were that you are using Python 3.7.3, while we only support 3.6.x and 3.5.x (AFAIK we've never tried 3.7.x); and that you have local changes to the code. Local changes would not explain 3 unrelated issues, so I suspect the Python version (it's a wide enough change to touch multiple areas, although I can't explain the hash failure).

Please try using Python 3.6.7.

Cheers, Neta

barrh commented 5 years ago

hey @Bowenwu1 ,

Your GPU is relatively new product, and I believe your driver doesn't fit it. According to https://en.wikipedia.org/wiki/CUDA#GPUs_supported, your device is supported only since CUDA 10.0, while you have version 9.0.

I think you should upgrade to the latest CUDA version. After that, you'll have to explicitly reinstall Pytorch v1.0.1 for CUDA10 with pip install --force https://download.pytorch.org/whl/cu100/torch-1.0.1-cp37-cp37m-linux_x86_64.whl Also, please rebase your branch on current Distiller master, and pip install.

Bowenwu1 commented 5 years ago

Thanks to @nzmora @barrh! During several tests, I believe this is a pytorch, cuda, GPU issue. My RTX 2070 can not support CUDA9 and it must work with CUDA10. So I try to install CUDA10 on my PC but I forget to replace my pytorch version and fail to solve it. After that, I call IT to bring me a GTX 1080Ti and problem solved.

Conclusion:

GTX 1080Ti
PyTorch:1.0.1
CUDA:9.0.176
cudNN:7402

IntelLabs / distiller

Error when running examples #267