Closed Bowenwu1 closed 5 years ago
Hi @Bowenwu1,
I am not able to reproduce this problem locally so I need more information from you. The first few 10s of lines in the log file contain information about your environment (incl. various SW package versions). Please post this information.
Another thing you can do is go to distiller/tests and invoke pytest
:
distiller/tests$ pytest
This invokes some unit-tests and will help me understand the state of your code (the tests take a few minutes to run). They should all pass.
Thanks Neta
First 10s of lines in the log file
2019-05-24 11:23:03,170 - Log file for this run: /home/dm/Documents/pytorch-cifar/distiller/examples/classifier_compression/logs/2019.05.24-112303/2019.05.24-112303.log
2019-05-24 11:23:03,170 - Number of CPUs: 12
2019-05-24 11:23:03,198 - Number of GPUs: 1
2019-05-24 11:23:03,198 - CUDA version: 9.0.176
2019-05-24 11:23:03,198 - CUDNN version: 7402
2019-05-24 11:23:03,198 - Kernel: 4.15.0-50-generic
2019-05-24 11:23:03,198 - Python: 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0]
2019-05-24 11:23:03,198 - PyTorch: 1.0.1
2019-05-24 11:23:03,198 - Numpy: 1.16.3
2019-05-24 11:23:03,221 - Git is dirty
2019-05-24 11:23:03,222 - Active Git branch: master
2019-05-24 11:23:03,232 - Git commit: 1f48fa64131596b181ebd26a59d2679f7f877dee
2019-05-24 11:23:03,232 - Command line: compress_classifier.py -a=resnet56_cifar -p=50 ../../../data.cifar10 --epochs=70 --lr=0.1 --compress=../pruning_filters_for_efficient_convnets/resnet56_cifar_filter_rank_v2.yaml --resume-from=checkpoint.resnet56_cifar_baseline.pth.tar --reset-optimizer --vs=0
2019-05-24 11:23:03,252 - => creating resnet56_cifar model for CIFAR10
2019-05-24 11:23:05,674 - => loading checkpoint checkpoint.resnet56_cifar_baseline.pth.tar
2019-05-24 11:23:05,685 - => Checkpoint contents:
╒═══════════════════╤═════════════╤════════════════╕
│ Key │ Type │ Value │
╞═══════════════════╪═════════════╪════════════════╡
│ arch │ str │ resnet56_cifar │
├───────────────────┼─────────────┼────────────────┤
│ best_top1 │ float │ 92.92 │
├───────────────────┼─────────────┼────────────────┤
│ compression_sched │ dict │ │
├───────────────────┼─────────────┼────────────────┤
│ epoch │ int │ 179 │
├───────────────────┼─────────────┼────────────────┤
│ optimizer │ dict │ │
├───────────────────┼─────────────┼────────────────┤
│ state_dict │ OrderedDict │ │
╘═══════════════════╧═════════════╧════════════════╛
2019-05-24 11:23:05,686 - Loaded compression schedule from checkpoint (epoch 179)
2019-05-24 11:23:05,700 - Optimizer could not be loaded from checkpoint.
2019-05-24 11:23:05,700 - => loaded checkpoint 'checkpoint.resnet56_cifar_baseline.pth.tar' (epoch 179)
2019-05-24 11:23:05,701 - Optimizer Type: <class 'torch.optim.sgd.SGD'>
2019-05-24 11:23:05,701 - Optimizer Args: {'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0.0001, 'nesterov': False}
2019-05-24 11:23:06,869 - Dataset sizes:
training=50000
validation=10000
test=10000
2019-05-24 11:23:06,870 - Reading compression schedule from: ../pruning_filters_for_efficient_convnets/resnet56_cifar_filter_rank_v2.yaml
output of pytest
:
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.7.3, pytest-3.5.1, py-1.8.0, pluggy-0.6.0
rootdir: /home/dm/Documents/pytorch-cifar/distiller, inifile:
collected 122 items
test_basic.py ... [ 2%]
test_infra.py ..F......... [ 12%]
test_learning_rate.py . [ 13%]
test_loss.py .. [ 14%]
test_lstm_impl.py .............. [ 26%]
test_model_summary.py F... [ 29%]
test_post_train_quant.py ........................ [ 49%]
test_pruning.py ............. [ 59%]
test_quant_utils.py ....... [ 65%]
test_quantizer.py ............................ [ 88%]
test_ranking.py .. [ 90%]
test_summarygraph.py .......... [ 98%]
test_thresholding.py .. [100%]
================================================================================================= FAILURES =================================================================================================
____________________________________________________________________________________ test_create_model_pretrainedmodels ____________________________________________________________________________________
def test_create_model_pretrainedmodels():
premodel_name = 'resnext101_32x4d'
> model = create_model(True, 'imagenet', premodel_name)
test_infra.py:53:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../distiller/models/__init__.py:74: in create_model
pretrained=(dataset if pretrained else None))
../../cifar/lib/python3.7/site-packages/pretrainedmodels/models/resnext.py:85: in resnext101_32x4d
model.load_state_dict(model_zoo.load_url(settings['url']))
../../cifar/lib/python3.7/site-packages/torch/utils/model_zoo.py:66: in load_url
_download_url_to_file(url, cached_file, hash_prefix, progress=progress)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
url = 'http://data.lip6.fr/cadene/pretrainedmodels/resnext101_32x4d-29e315fa.pth', dst = '/home/dm/.torch/models/resnext101_32x4d-29e315fa.pth', hash_prefix = '29e315fa', progress = True
def _download_url_to_file(url, dst, hash_prefix, progress):
file_size = None
if requests_available:
u = urlopen(url, stream=True)
if hasattr(u.headers, "Content-Length"):
file_size = int(u.headers["Content-Length"])
u = u.raw
else:
u = urlopen(url)
meta = u.info()
if hasattr(meta, 'getheaders'):
content_length = meta.getheaders("Content-Length")
else:
content_length = meta.get_all("Content-Length")
if content_length is not None and len(content_length) > 0:
file_size = int(content_length[0])
f = tempfile.NamedTemporaryFile(delete=False)
try:
if hash_prefix is not None:
sha256 = hashlib.sha256()
with tqdm(total=file_size, disable=not progress) as pbar:
while True:
buffer = u.read(8192)
if len(buffer) == 0:
break
f.write(buffer)
if hash_prefix is not None:
sha256.update(buffer)
pbar.update(len(buffer))
f.close()
if hash_prefix is not None:
digest = sha256.hexdigest()
if digest[:len(hash_prefix)] != hash_prefix:
raise RuntimeError('invalid hash value (expected "{}", got "{}")'
> .format(hash_prefix, digest))
E RuntimeError: invalid hash value (expected "29e315fa", got "2be79b781dde89a798dddaa2f9d1a865b50ae6f714e87ae445f309acb7f91b27")
../../cifar/lib/python3.7/site-packages/torch/utils/model_zoo.py:106: RuntimeError
------------------------------------------------------------------------------------------- Captured stderr call -------------------------------------------------------------------------------------------
Downloading: "http://data.lip6.fr/cadene/pretrainedmodels/resnext101_32x4d-29e315fa.pth" to /home/dm/.torch/models/resnext101_32x4d-29e315fa.pth
19681944it [02:45, 118887.36it/s]
___________________________________________________________________________________________ test_png_generation ____________________________________________________________________________________________
self = <pydot.Dot object at 0x7f4a704f2550>, prog = 'dot', format = 'png', encoding = None
def create(self, prog=None, format='ps', encoding=None):
"""Creates and returns a binary image for the graph.
create will write the graph to a temporary dot file in the
encoding specified by `encoding` and process it with the
program given by 'prog' (which defaults to 'twopi'), reading
the binary image output and return it as:
- `str` of bytes in Python 2
- `bytes` in Python 3
There's also the preferred possibility of using:
create_'format'(prog='program')
which are automatically defined for all the supported formats,
for example:
- `create_ps()`
- `create_gif()`
- `create_dia()`
If 'prog' is a list, instead of a string,
then the fist item is expected to be the program name,
followed by any optional command-line arguments for it:
[ 'twopi', '-Tdot', '-s10' ]
@param prog: either:
- name of GraphViz executable that
can be found in the `$PATH`, or
- absolute path to GraphViz executable.
If you have added GraphViz to the `$PATH` and
use its executables as installed
(without renaming any of them)
then their names are:
- `'dot'`
- `'twopi'`
- `'neato'`
- `'circo'`
- `'fdp'`
- `'sfdp'`
On Windows, these have the notorious ".exe" extension that,
only for the above strings, will be added automatically.
The `$PATH` is inherited from `os.env['PATH']` and
passed to `subprocess.Popen` using the `env` argument.
If you haven't added GraphViz to your `$PATH` on Windows,
then you may want to give the absolute path to the
executable (for example, to `dot.exe`) in `prog`.
"""
default_names = set([
'dot', 'twopi', 'neato',
'circo', 'fdp', 'sfdp'])
if prog is None:
prog = self.prog
assert prog is not None
if isinstance(prog, (list, tuple)):
prog, args = prog[0], prog[1:]
else:
args = []
if os.name == 'nt' and prog in default_names:
if not prog.endswith('.exe'):
prog += '.exe'
# temp file
tmp_fd, tmp_name = tempfile.mkstemp()
os.close(tmp_fd)
self.write(tmp_name, encoding=encoding)
tmp_dir = os.path.dirname(tmp_name)
# For each of the image files...
for img in self.shape_files:
# Get its data
f = open(img, 'rb')
f_data = f.read()
f.close()
# And copy it under a file with the same name in
# the temporary directory
f = open(os.path.join(tmp_dir, os.path.basename(img)), 'wb')
f.write(f_data)
f.close()
# explicitly inherit `$PATH`, on Windows too,
# with `shell=False`
env = dict()
env['PATH'] = os.environ.get('PATH', '')
env['LD_LIBRARY_PATH'] = os.environ.get('LD_LIBRARY_PATH', '')
cmdline = [prog, '-T' + format] + args + [tmp_name]
try:
p = subprocess.Popen(
cmdline,
env=env,
cwd=tmp_dir,
shell=False,
> stderr=subprocess.PIPE, stdout=subprocess.PIPE)
../../cifar/lib/python3.7/site-packages/pydot.py:1861:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <subprocess.Popen object at 0x7f4a5c5c9908>, args = ['dot', '-Tpng', '/tmp/tmp36lz76vx'], bufsize = -1, executable = None, stdin = None, stdout = -1, stderr = -1, preexec_fn = None
close_fds = True, shell = False, cwd = '/tmp'
env = {'LD_LIBRARY_PATH': '/usr/local/cuda-9.0/lib64', 'PATH': '/home/dm/Documents/pytorch-cifar/cifar/bin:/usr/local/cuda-9...home/dm/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin'}
universal_newlines = None, startupinfo = None, creationflags = 0, restore_signals = True, start_new_session = False, pass_fds = ()
def __init__(self, args, bufsize=-1, executable=None,
stdin=None, stdout=None, stderr=None,
preexec_fn=None, close_fds=True,
shell=False, cwd=None, env=None, universal_newlines=None,
startupinfo=None, creationflags=0,
restore_signals=True, start_new_session=False,
pass_fds=(), *, encoding=None, errors=None, text=None):
"""Create new Popen instance."""
_cleanup()
# Held while anything is calling waitpid before returncode has been
# updated to prevent clobbering returncode if wait() or poll() are
# called from multiple threads at once. After acquiring the lock,
# code must re-check self.returncode to see if another thread just
# finished a waitpid() call.
self._waitpid_lock = threading.Lock()
self._input = None
self._communication_started = False
if bufsize is None:
bufsize = -1 # Restore default
if not isinstance(bufsize, int):
raise TypeError("bufsize must be an integer")
if _mswindows:
if preexec_fn is not None:
raise ValueError("preexec_fn is not supported on Windows "
"platforms")
else:
# POSIX
if pass_fds and not close_fds:
warnings.warn("pass_fds overriding close_fds.", RuntimeWarning)
close_fds = True
if startupinfo is not None:
raise ValueError("startupinfo is only supported on Windows "
"platforms")
if creationflags != 0:
raise ValueError("creationflags is only supported on Windows "
"platforms")
self.args = args
self.stdin = None
self.stdout = None
self.stderr = None
self.pid = None
self.returncode = None
self.encoding = encoding
self.errors = errors
# Validate the combinations of text and universal_newlines
if (text is not None and universal_newlines is not None
and bool(universal_newlines) != bool(text)):
raise SubprocessError('Cannot disambiguate when both text '
'and universal_newlines are supplied but '
'different. Pass one or the other.')
# Input and output objects. The general principle is like
# this:
#
# Parent Child
# ------ -----
# p2cwrite ---stdin---> p2cread
# c2pread <--stdout--- c2pwrite
# errread <--stderr--- errwrite
#
# On POSIX, the child objects are file descriptors. On
# Windows, these are Windows file handles. The parent objects
# are file descriptors on both platforms. The parent objects
# are -1 when not using PIPEs. The child objects are -1
# when not redirecting.
(p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite) = self._get_handles(stdin, stdout, stderr)
# We wrap OS handles *before* launching the child, otherwise a
# quickly terminating child could make our fds unwrappable
# (see #8458).
if _mswindows:
if p2cwrite != -1:
p2cwrite = msvcrt.open_osfhandle(p2cwrite.Detach(), 0)
if c2pread != -1:
c2pread = msvcrt.open_osfhandle(c2pread.Detach(), 0)
if errread != -1:
errread = msvcrt.open_osfhandle(errread.Detach(), 0)
self.text_mode = encoding or errors or text or universal_newlines
# How long to resume waiting on a child after the first ^C.
# There is no right value for this. The purpose is to be polite
# yet remain good for interactive users trying to exit a tool.
self._sigint_wait_secs = 0.25 # 1/xkcd221.getRandomNumber()
self._closed_child_pipe_fds = False
try:
if p2cwrite != -1:
self.stdin = io.open(p2cwrite, 'wb', bufsize)
if self.text_mode:
self.stdin = io.TextIOWrapper(self.stdin, write_through=True,
line_buffering=(bufsize == 1),
encoding=encoding, errors=errors)
if c2pread != -1:
self.stdout = io.open(c2pread, 'rb', bufsize)
if self.text_mode:
self.stdout = io.TextIOWrapper(self.stdout,
encoding=encoding, errors=errors)
if errread != -1:
self.stderr = io.open(errread, 'rb', bufsize)
if self.text_mode:
self.stderr = io.TextIOWrapper(self.stderr,
encoding=encoding, errors=errors)
self._execute_child(args, executable, preexec_fn, close_fds,
pass_fds, cwd, env,
startupinfo, creationflags, shell,
p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite,
> restore_signals, start_new_session)
/home/dm/miniconda3/lib/python3.7/subprocess.py:775:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <subprocess.Popen object at 0x7f4a5c5c9908>, args = ['dot', '-Tpng', '/tmp/tmp36lz76vx'], executable = b'dot', preexec_fn = None, close_fds = True, pass_fds = (), cwd = '/tmp'
env = {'LD_LIBRARY_PATH': '/usr/local/cuda-9.0/lib64', 'PATH': '/home/dm/Documents/pytorch-cifar/cifar/bin:/usr/local/cuda-9...home/dm/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin'}
startupinfo = None, creationflags = 0, shell = False, p2cread = -1, p2cwrite = -1, c2pread = 35, c2pwrite = 37, errread = 38, errwrite = 39, restore_signals = True, start_new_session = False
def _execute_child(self, args, executable, preexec_fn, close_fds,
pass_fds, cwd, env,
startupinfo, creationflags, shell,
p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite,
restore_signals, start_new_session):
"""Execute program (POSIX version)"""
if isinstance(args, (str, bytes)):
args = [args]
else:
args = list(args)
if shell:
# On Android the default shell is at '/system/bin/sh'.
unix_shell = ('/system/bin/sh' if
hasattr(sys, 'getandroidapilevel') else '/bin/sh')
args = [unix_shell, "-c"] + args
if executable:
args[0] = executable
if executable is None:
executable = args[0]
orig_executable = executable
# For transferring possible exec failure from child to parent.
# Data format: "exception name:hex errno:description"
# Pickle is not used; it is complex and involves memory allocation.
errpipe_read, errpipe_write = os.pipe()
# errpipe_write must not be in the standard io 0, 1, or 2 fd range.
low_fds_to_close = []
while errpipe_write < 3:
low_fds_to_close.append(errpipe_write)
errpipe_write = os.dup(errpipe_write)
for low_fd in low_fds_to_close:
os.close(low_fd)
try:
try:
# We must avoid complex work that could involve
# malloc or free in the child process to avoid
# potential deadlocks, thus we do all this here.
# and pass it to fork_exec()
if env is not None:
env_list = []
for k, v in env.items():
k = os.fsencode(k)
if b'=' in k:
raise ValueError("illegal environment variable name")
env_list.append(k + b'=' + os.fsencode(v))
else:
env_list = None # Use execv instead of execve.
executable = os.fsencode(executable)
if os.path.dirname(executable):
executable_list = (executable,)
else:
# This matches the behavior of os._execvpe().
executable_list = tuple(
os.path.join(os.fsencode(dir), executable)
for dir in os.get_exec_path(env))
fds_to_keep = set(pass_fds)
fds_to_keep.add(errpipe_write)
self.pid = _posixsubprocess.fork_exec(
args, executable_list,
close_fds, tuple(sorted(map(int, fds_to_keep))),
cwd, env_list,
p2cread, p2cwrite, c2pread, c2pwrite,
errread, errwrite,
errpipe_read, errpipe_write,
restore_signals, start_new_session, preexec_fn)
self._child_created = True
finally:
# be sure the FD is closed no matter what
os.close(errpipe_write)
# self._devnull is not always defined.
devnull_fd = getattr(self, '_devnull', None)
if p2cread != -1 and p2cwrite != -1 and p2cread != devnull_fd:
os.close(p2cread)
if c2pwrite != -1 and c2pread != -1 and c2pwrite != devnull_fd:
os.close(c2pwrite)
if errwrite != -1 and errread != -1 and errwrite != devnull_fd:
os.close(errwrite)
if devnull_fd is not None:
os.close(devnull_fd)
# Prevent a double close of these fds from __init__ on error.
self._closed_child_pipe_fds = True
# Wait for exec to fail or succeed; possibly raising an
# exception (limited in size)
errpipe_data = bytearray()
while True:
part = os.read(errpipe_read, 50000)
errpipe_data += part
if not part or len(errpipe_data) > 50000:
break
finally:
# be sure the FD is closed no matter what
os.close(errpipe_read)
if errpipe_data:
try:
pid, sts = os.waitpid(self.pid, 0)
if pid == self.pid:
self._handle_exitstatus(sts)
else:
self.returncode = sys.maxsize
except ChildProcessError:
pass
try:
exception_name, hex_errno, err_msg = (
errpipe_data.split(b':', 2))
# The encoding here should match the encoding
# written in by the subprocess implementations
# like _posixsubprocess
err_msg = err_msg.decode()
except ValueError:
exception_name = b'SubprocessError'
hex_errno = b'0'
err_msg = 'Bad exception data from child: {!r}'.format(
bytes(errpipe_data))
child_exception_type = getattr(
builtins, exception_name.decode('ascii'),
SubprocessError)
if issubclass(child_exception_type, OSError) and hex_errno:
errno_num = int(hex_errno, 16)
child_exec_never_called = (err_msg == "noexec")
if child_exec_never_called:
err_msg = ""
# The error must be from chdir(cwd).
err_filename = cwd
else:
err_filename = orig_executable
if errno_num != 0:
err_msg = os.strerror(errno_num)
if errno_num == errno.ENOENT:
err_msg += ': ' + repr(err_filename)
> raise child_exception_type(errno_num, err_msg, err_filename)
E FileNotFoundError: [Errno 2] No such file or directory: 'dot': 'dot'
/home/dm/miniconda3/lib/python3.7/subprocess.py:1522: FileNotFoundError
During handling of the above exception, another exception occurred:
def test_png_generation():
dataset = "cifar10"
arch = "resnet20_cifar"
model, _ = common.setup_test(arch, dataset, parallel=True)
# 2 different ways to create a PNG
> distiller.draw_img_classifier_to_file(model, 'model.png', dataset, True)
test_model_summary.py:35:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../distiller/model_summaries.py:439: in draw_img_classifier_to_file
draw_model_to_file(g, png_fname, display_param_nodes, rankdir, styles)
../distiller/model_summaries.py:411: in draw_model_to_file
png = create_png(sgraph, display_param_nodes=display_param_nodes)
../distiller/model_summaries.py:391: in create_png
png = pydot_graph.create_png()
../../cifar/lib/python3.7/site-packages/pydot.py:1662: in new_method
format=f, prog=prog, encoding=encoding)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <pydot.Dot object at 0x7f4a704f2550>, prog = 'dot', format = 'png', encoding = None
def create(self, prog=None, format='ps', encoding=None):
"""Creates and returns a binary image for the graph.
create will write the graph to a temporary dot file in the
encoding specified by `encoding` and process it with the
program given by 'prog' (which defaults to 'twopi'), reading
the binary image output and return it as:
- `str` of bytes in Python 2
- `bytes` in Python 3
There's also the preferred possibility of using:
create_'format'(prog='program')
which are automatically defined for all the supported formats,
for example:
- `create_ps()`
- `create_gif()`
- `create_dia()`
If 'prog' is a list, instead of a string,
then the fist item is expected to be the program name,
followed by any optional command-line arguments for it:
[ 'twopi', '-Tdot', '-s10' ]
@param prog: either:
- name of GraphViz executable that
can be found in the `$PATH`, or
- absolute path to GraphViz executable.
If you have added GraphViz to the `$PATH` and
use its executables as installed
(without renaming any of them)
then their names are:
- `'dot'`
- `'twopi'`
- `'neato'`
- `'circo'`
- `'fdp'`
- `'sfdp'`
On Windows, these have the notorious ".exe" extension that,
only for the above strings, will be added automatically.
The `$PATH` is inherited from `os.env['PATH']` and
passed to `subprocess.Popen` using the `env` argument.
If you haven't added GraphViz to your `$PATH` on Windows,
then you may want to give the absolute path to the
executable (for example, to `dot.exe`) in `prog`.
"""
default_names = set([
'dot', 'twopi', 'neato',
'circo', 'fdp', 'sfdp'])
if prog is None:
prog = self.prog
assert prog is not None
if isinstance(prog, (list, tuple)):
prog, args = prog[0], prog[1:]
else:
args = []
if os.name == 'nt' and prog in default_names:
if not prog.endswith('.exe'):
prog += '.exe'
# temp file
tmp_fd, tmp_name = tempfile.mkstemp()
os.close(tmp_fd)
self.write(tmp_name, encoding=encoding)
tmp_dir = os.path.dirname(tmp_name)
# For each of the image files...
for img in self.shape_files:
# Get its data
f = open(img, 'rb')
f_data = f.read()
f.close()
# And copy it under a file with the same name in
# the temporary directory
f = open(os.path.join(tmp_dir, os.path.basename(img)), 'wb')
f.write(f_data)
f.close()
# explicitly inherit `$PATH`, on Windows too,
# with `shell=False`
env = dict()
env['PATH'] = os.environ.get('PATH', '')
env['LD_LIBRARY_PATH'] = os.environ.get('LD_LIBRARY_PATH', '')
cmdline = [prog, '-T' + format] + args + [tmp_name]
try:
p = subprocess.Popen(
cmdline,
env=env,
cwd=tmp_dir,
shell=False,
stderr=subprocess.PIPE, stdout=subprocess.PIPE)
except OSError as e:
> if e.errno == os.errno.ENOENT:
E AttributeError: module 'os' has no attribute 'errno'
../../cifar/lib/python3.7/site-packages/pydot.py:1863: AttributeError
------------------------------------------------------------------------------------------- Captured stderr call -------------------------------------------------------------------------------------------
INFO:root:=> creating resnet20_cifar model for CIFAR10
-------------------------------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------------------------------
__init__.py 92 INFO => creating resnet20_cifar model for CIFAR10
================================================================================== 2 failed, 120 passed in 259.18 seconds ==================================================================================
Hi @Bowenwu1, I see 3 seemingly unrelated errors: the original problem, and the 2 failures in the unit-tests (one in pydot; the other is due to a wrong hash of a downloaded model). I don't see any connection between these errors.
The log files tells me you are at commit 1f48fa64131596b181ebd26a59d2679f7f877dee
with local changes (Git is dirty
). I moved my git history to this commit and ran the unit tests w/o any problems.
The only two clues I found were that you are using Python 3.7.3, while we only support 3.6.x and 3.5.x (AFAIK we've never tried 3.7.x); and that you have local changes to the code. Local changes would not explain 3 unrelated issues, so I suspect the Python version (it's a wide enough change to touch multiple areas, although I can't explain the hash failure).
Please try using Python 3.6.7.
Cheers, Neta
hey @Bowenwu1 ,
Your GPU is relatively new product, and I believe your driver doesn't fit it. According to https://en.wikipedia.org/wiki/CUDA#GPUs_supported, your device is supported only since CUDA 10.0, while you have version 9.0.
I think you should upgrade to the latest CUDA version. After that, you'll have to explicitly reinstall Pytorch v1.0.1 for CUDA10 with pip install --force https://download.pytorch.org/whl/cu100/torch-1.0.1-cp37-cp37m-linux_x86_64.whl
Also, please rebase your branch on current Distiller master, and pip install.
Thanks to @nzmora @barrh!
During several tests, I believe this is a pytorch, cuda, GPU
issue. My RTX 2070 can not support CUDA9 and it must work with CUDA10. So I try to install CUDA10 on my PC but I forget to replace my pytorch version and fail to solve it. After that, I call IT to bring me a GTX 1080Ti and problem solved.
Conclusion:
When I try to run one of your examples https://github.com/NervanaSystems/distiller/blob/master/examples/pruning_filters_for_efficient_convnets/resnet56_cifar_filter_rank_v2.yaml I encountered following error:
My cuda and cudnn: My GPU and nvidia driver version:
Looking forward for your reply, thank you!