fastai / fastai

The fastai deep learning library
http://docs.fast.ai
Apache License 2.0
26.23k stars 7.56k forks source link

utils.mod_display causes BrokenPipeError on Windows 10 #2225

Closed nathanielmhld closed 5 years ago

nathanielmhld commented 5 years ago

Describe the bug

On my windows VM, when the pattern

with progress_disabled_ctx(learn) as learn:
    learn.fit(1)

is used, I get an error, that to my non-expert eyes, looks to be because of an issue with the progress bar. The error does not occur with simply

    learn.fit(1)

nor does it occur on my mac. I have tried using the current version of FastAI from git, to no avail.

Provide your installation details

The issue first occurred on the regular (non-dev) version of fastai, but I have since installed the dev version, which is reflected in this printout.

=== Software === 
python       : 3.6.8
fastai       : 1.0.56.dev0
fastprogress : 0.1.21
torch        : 1.1.0
torch cuda   : 9.0 / is **Not available** 

=== Hardware === 
No GPUs available 

=== Environment === 
platform     : Windows-10-10.0.17763-SP0
conda env    : Unknown
python       : C:\ProgramData\Anaconda3\python.exe
sys.path     : C:\ProgramData\Anaconda3\python36.zip
C:\ProgramData\Anaconda3\DLLs
C:\ProgramData\Anaconda3\lib
C:\ProgramData\Anaconda3

C:\Users\nathaniel\AppData\Roaming\Python\Python36\site-packages
C:\ProgramData\Anaconda3\lib\site-packages
C:\ProgramData\Anaconda3\lib\site-packages\win32
C:\ProgramData\Anaconda3\lib\site-packages\win32\lib
C:\ProgramData\Anaconda3\lib\site-packages\Pythonwin
C:\ProgramData\Anaconda3\lib\site-packages\IPython\extensions
C:\Users\nathaniel\.ipython
no supported gpus found on this system

To Reproduce

data = TabularDataBunch.from_df(
            work_dir,
            df_train_and_validate, 
            target, 
            valid_idx=valid_idx,
            procs=procs
        )

        learn = tabular_learner(
            data, 
            layers=layers, 
            metrics=accuracy
        )

        with progress_disabled_ctx(learn) as learn:
            learn.fit_one_cycle(numEpochs, lr)

Expected behavior

I expect the learner to run, without the progress bar showing. Screenshots

---------------------------------------------------------------------------
BrokenPipeError                           Traceback (most recent call last)

     64 
     65         with progress_disabled_ctx(learn) as learn:
---> 66             learn.fit_one_cycle(numEpochs, lr)
     67 
     68         #Save the model

C:\ProgramData\Anaconda3\lib\site-packages\fastai\train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, final_div, wd, callbacks, tot_epochs, start_epoch)
     20     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor, pct_start=pct_start,
     21                                        final_div=final_div, tot_epochs=tot_epochs, start_epoch=start_epoch))
---> 22     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
     23 
     24 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, wd:float=None):

C:\ProgramData\Anaconda3\lib\site-packages\fastai\basic_train.py in fit(self, epochs, lr, wd, callbacks)
    198         callbacks = [cb(self) for cb in self.callback_fns + listify(defaults.extra_callback_fns)] + listify(callbacks)
    199         if defaults.extra_callbacks is not None: callbacks += defaults.extra_callbacks
--> 200         fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
    201 
    202     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

C:\ProgramData\Anaconda3\lib\site-packages\fastai\basic_train.py in fit(epochs, learn, callbacks, metrics)
     97             cb_handler.set_dl(learn.data.train_dl)
     98             cb_handler.on_epoch_begin()
---> 99             for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
    100                 xb, yb = cb_handler.on_batch_begin(xb, yb)
    101                 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)

C:\ProgramData\Anaconda3\lib\site-packages\fastprogress\fastprogress.py in __iter__(self)
     70         self.update(0)
     71         try:
---> 72             for i,o in enumerate(self._gen):
     73                 if i >= self.total: break
     74                 yield o

C:\ProgramData\Anaconda3\lib\site-packages\fastai\basic_data.py in __iter__(self)
     73     def __iter__(self):
     74         "Process and returns items from `DataLoader`."
---> 75         for b in self.dl: yield self.proc_batch(b)
     76 
     77     @classmethod

C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in __iter__(self)
    191 
    192     def __iter__(self):
--> 193         return _DataLoaderIter(self)
    194 
    195     def __len__(self):

C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in __init__(self, loader)
    467                 #     before it starts, and __del__ tries to join but will get:
    468                 #     AssertionError: can only join a started process.
--> 469                 w.start()
    470                 self.index_queues.append(index_queue)
    471                 self.workers.append(w)

C:\ProgramData\Anaconda3\lib\multiprocessing\process.py in start(self)
    103                'daemonic processes are not allowed to have children'
    104         _cleanup()
--> 105         self._popen = self._Popen(self)
    106         self._sentinel = self._popen.sentinel
    107         # Avoid a refcycle if the target function holds an indirect

C:\ProgramData\Anaconda3\lib\multiprocessing\context.py in _Popen(process_obj)
    221     @staticmethod
    222     def _Popen(process_obj):
--> 223         return _default_context.get_context().Process._Popen(process_obj)
    224 
    225 class DefaultContext(BaseContext):

C:\ProgramData\Anaconda3\lib\multiprocessing\context.py in _Popen(process_obj)
    320         def _Popen(process_obj):
    321             from .popen_spawn_win32 import Popen
--> 322             return Popen(process_obj)
    323 
    324     class SpawnContext(BaseContext):

C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
     63             try:
     64                 reduction.dump(prep_data, to_child)
---> 65                 reduction.dump(process_obj, to_child)
     66             finally:
     67                 set_spawning_popen(None)

C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

BrokenPipeError: [Errno 32] Broken pipe

Additional context

I know the feature of removing the progress bar is new, which made me more confident that this issue hadn't been dealt with before, though this is my first ever issue filed in open source code, so please let me know if I've done something wrong.

Additionally, the reason I need progress bar disabled is because, as currently implemented, when the progress bar finishes, the output (in jupyter notebook) for that particular cell completely clears before the results are printed. This is a serious problem for our workflow, because we need the previous text outputted earlier in the process to remain around. If whoever reviews this feels it would be appropriate, please let me know and I'd be happy to file a feature request or a bug report regarding that issue.

sgugger commented 5 years ago

That is very weird. I don't see how this could have any effect to break the dataloader. Haven't tested on Windows but your example runs fine on my machine. Will try on Windows when I can access a proper install there.

For clearing the output, I can add an option to fastprogress to remove that. Will work on it when I have a bit of time.

sgugger commented 5 years ago

Added the option in fastprogress to not automatically clear the outputs. To use it, you have to do an editable install (until the next release), then:

from fastprogress import fastprogress
fastprogress.CLEAR_OUTPUT = False