Kaggle / docker-python

Kaggle Python docker image
Apache License 2.0
2.45k stars 948 forks source link

Fast Ai not working from kaggle #340

Closed jaideep11061982 closed 5 years ago

jaideep11061982 commented 6 years ago

hi vincent i still face the same issue, i know it was upgraded but i think it must have been days back... Yesterday IST night time,suddenly i start getting module not found errors like say 8pm i was working with it so it got loaded in RAM ,at 11pm when i did commit to my code,i started getting this error and continue to get the same. kindly look into this urgently my work is stuck "Previous quote " Fastai was upgraded from 0.7.x to 1.0.x in the latest release. I confirmed that I can call fastai from a Kaggle Kernel.

Please reopen this issue if you are still having trouble with the fastai library on Kaggle Kernels.

Thank you

Originally posted by @rosbo in https://github.com/Kaggle/docker-python/pull/338#issuecomment-429448539

jaideep11061982 commented 6 years ago

FYI i even did the pip install in first block restarted the kernel ,still got the same issue

jaideep11061982 commented 6 years ago

from fastai import from fastai.imports import from fastai.torch_imports import from fastai.transforms import from fastai.conv_learner import from fastai.model import from fastai.dataset import from fastai.sgdr import from fastai.plots import *

jaideep11061982 commented 6 years ago

ModuleNotFoundError Traceback (most recent call last)

in 2 from fastai import * 3 from fastai.imports import * ----> 4 from fastai.torch_imports import * 5 from fastai.transforms import * 6 from fastai.conv_learner import * ModuleNotFoundError: No module named 'fastai.torch_imports'
diskshima commented 6 years ago

@jaideep11061982 I belive the new version 1.0 of fastai is vastly different from the previous 0.7. Most of those imports do not exist in 1.0 I think you have two options:

Option 1: Downgrade to fastai 0.7

You can follow the steps stated here.

Option 2: Upgrade your code to use fastai 1.0

One caveat with this is that Kaggle's Docker image is still using Pytorch 0.4 (fastai 1.0 relies on Pytorch 1.0) so there's a high chance that your code may fail. I worked around it by adding the below to the top of your code (or Notebook).

!pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu90/torch_nightly.html

:warning: Big Caveat (for both Option 1 & 2) :warning:

One (more) caveat with both options is that they are replacing an already loaded Python package so I had to rely on the fact that the kernel dies and restarts after installing packages which forces it to load the new packages. I can confirm this is the case for option 2 but not sure about option 1 (though logically I think it makes sense).

You can run the below code in your code to confirm you have the right Pytorch version (the below is for option 2).

import torch
print(torch.__version__)
image
rosbo commented 6 years ago

We decided to revert back to using fastai 0.7 for now and keep pytorch 0.4.1. Let us know if you are still experiencing issues with fastai on kaggle.

jazoom commented 6 years ago
!pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu90/torch_nightly.html
import torch
print(torch.__version__)

3 days later and this doesn't seem to work anymore. I can't get it to pick up the updated version of Pytorch. Using the refresh in the console completely resets the notebook, so that doesn't help.

I even tried using !pip uninstall -y torch first, but even though the operation seems to succeed, Pytorch remains installed at version 0.4.1.

diskshima commented 6 years ago

@jazoom

I even tried using !pip uninstall -y torch first, but even though the operation seems to succeed, Pytorch remains installed at version 0.4.1.

Yeah, I had to rely on the fact that the "die & restart" dialog pops up after package installs but seems like Kaggle has stabilized and it doesn't come up anymore (it was a "hack" anyways).

I think you'll have to wait for either:

jazoom commented 6 years ago

@diskshima thanks for the confirmation. I'll probably just wait. Otherwise I'll spend more time hacking around dependencies than actually doing ML.

rosbo commented 6 years ago

The rollout of the new Docker image to our GPU workers is completed.

Don't hesitate to reopen this issue if you are still having trouble with the fastai library.

miwojc commented 6 years ago

We decided to revert back to using fastai 0.7 for now and keep pytorch 0.4.1. Let us know if you are still experiencing issues with fastai on kaggle.

The new fastai course starts soon (Oct 22) and will be using the new library. Ideally we need a way to choose between 0.7 and 1 so that people can use both codebases (old and new course). Is this possible?

Is there a way to upgrade to library v1 for kernels? Would the below work?

pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html
pip install fastai

https://github.com/fastai/fastai#pypi-install

rosbo commented 6 years ago

We are working on a solution to make sure the pip install command will work as you would expect. Stay tuned.

jazoom commented 6 years ago

Will that prevent use of the GPU?

rosbo commented 6 years ago

Once implemented, that solution should work on both GPU and CPU sessions.

miwojc commented 6 years ago

just tried running lesson 1 from the new course (v1 library)

i run the below commands first:

!pip install --upgrade pip
!pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html
!pip install fastai==1.0.6

then fastai imporst:

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai import *
from fastai.vision import *

and got error message:

ImportError                               Traceback (most recent call last)
<ipython-input-2-500c5388d4ff> in <module>
      3 get_ipython().run_line_magic('matplotlib', 'inline')
      4 
----> 5 from fastai import *
      6 from fastai.vision import *

/opt/conda/lib/python3.6/site-packages/fastai/__init__.py in <module>
----> 1 from .basic_train import *
      2 from .callback import *
      3 from .callbacks import *
      4 from .core import *
      5 from .data import *

/opt/conda/lib/python3.6/site-packages/fastai/basic_train.py in <module>
      1 "Provides basic training and validation with `Learner`"
----> 2 from .torch_core import *
      3 from .data import *
      4 from .callback import *
      5 

/opt/conda/lib/python3.6/site-packages/fastai/torch_core.py in <module>
      1 "Utility functions to help deal with tensors"
----> 2 from .imports.torch import *
      3 from .core import *
      4 
      5 AffineMatrix = Tensor

/opt/conda/lib/python3.6/site-packages/fastai/imports/__init__.py in <module>
      1 from .core import *
----> 2 from .torch import *

/opt/conda/lib/python3.6/site-packages/fastai/imports/torch.py in <module>
      2 from torch import ByteTensor, DoubleTensor, FloatTensor, HalfTensor, LongTensor, ShortTensor, Tensor
      3 from torch import nn, optim, as_tensor
----> 4 from torch.utils.data import BatchSampler, DataLoader, Dataset, Sampler, TensorDataset

/opt/conda/lib/python3.6/site-packages/torch/utils/data/__init__.py in <module>
      3 from .distributed import DistributedSampler
      4 from .dataset import Dataset, TensorDataset, ConcatDataset, Subset, random_split
----> 5 from .dataloader import DataLoader

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in <module>
      7 import signal
      8 import functools
----> 9 from torch._six import container_abcs
     10 import re
     11 import sys

ImportError: cannot import name 'container_abcs'
[Error]

link to the new lesson 1: https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson1-pets.ipynb

diskshima commented 6 years ago

@miwojc I believe the Kaggle kernels still use the Pytorch 0.4.1 & fastai 0.7. You'll have to wait as per the comment by rosbo.

We are working on a solution to make sure the pip install command will work as you would expect. Stay tuned.

miwojc commented 6 years ago

@diskshima i have tired to update pytorch and fastai with pip, like that:

!pip install --upgrade pip
!pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html
!pip install fastai==1.0.6
diskshima commented 6 years ago

@miwojc Like my comment here, this used to "work" because the kernel would error and restart automatically forcing a reload of the libraries. The kernel has since stabilized and that hardly happens anymore. You'll have to wait for Kaggle to officially start supporting multiple versions of libraries (e.g. Pytorch, fastai).

miwojc commented 6 years ago

@diskshima How long do i have to wait? just considering options how to run the course that starts this Monady, and at the moment kaggle kernels doesn't seem to be a viable option. Thanks!

rosbo commented 6 years ago

@d1jang is working on this. I will let him answer.

miwojc commented 6 years ago

thanks! other platforms, like papersapce, salamander, sagemaker, google cloud platform, aws offer working solutions for fastai course and library v1 with pytorch v1 (with easy install via image) so i am assuming it's possible...

d1jang commented 6 years ago

Thanks for your user feedback. We're actively working on fixing this issue, but this is related to the core of Jupyter Notebook, so it takes time and efforts to fix it.

The main issue here is that Juputer Notebook Kernel preloads torch and it caches it indefinitely. So no matter what you do with it, the Notebook session keeps using it.

Therefore, a workaround to this problem is to restart the Jupyter Notebook Kernel after installing the libraries (Note that this workaround is not working for committing a Kaggle Kernel (due to the lack of chance to restart a Jupyter Kernel instance):

  1. Enable the Internet & GPU.
  2. Execute

!pip install --upgrade pip !pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html !pip install fastai==1.0.6

  1. Ctrl+shift+p and select "confirm restart kernel" This will restart the Jupyter Kernel instance and reload the installed libraries.

  2. Wait until the Jupyter Kernel finishes the restart

  3. import fastai shows the correct version.

I'm working on fixing this issue fundamentally for Kaggle Kernels, and update this thread with any progress.

Thanks!

miwojc commented 6 years ago

thank you @d1jang for the workaround i tried to run lesson 1 using the workaround, however i got error: RuntimeError: DataLoader worker (pid 92) is killed by signal: Bus error. when running: learn.fit_one_cycle(4)

is there any solution for that please?

found this solution on forums: https://forums.fast.ai/t/runtimeerror-dataloader-worker-pid-137-is-killed-by-signal-bus-error/27095

but it's not ideal

can this issue be fixed?

noklam commented 5 years ago

@d1jang Thank you for this temporary solution, appreciate your help! I have this error. The current Nvidia driver is 390.25 on Kaggle.


AssertionError Traceback (most recent call last)

in ----> 1 learn = Learner.create_unet(databunch, models.resnet34, metrics=dice) /opt/conda/lib/python3.6/site-packages/fastai/train.py in Learner_create_unet(cls, data, arch, pretrained, split_on, **kwargs) 81 meta = cnn_config(arch) 82 body = create_body(arch(pretrained), meta['cut']) ---> 83 model = models.unet.DynamicUnet(body, n_classes=data.c).cuda() 84 learn = cls(data, model, **kwargs) 85 learn.split(ifnone(split_on,meta['split'])) /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in cuda(self, device) 256 Module: self 257 """ --> 258 return self._apply(lambda t: t.cuda(device)) 259 260 def cpu(self): /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn) 183 def _apply(self, fn): 184 for module in self.children(): --> 185 module._apply(fn) 186 187 for param in self._parameters.values(): /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn) 183 def _apply(self, fn): 184 for module in self.children(): --> 185 module._apply(fn) 186 187 for param in self._parameters.values(): /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn) 189 # Tensors stored in modules are graph leaves, and we don't 190 # want to create copy nodes, so we have to unpack the data. --> 191 param.data = fn(param.data) 192 if param._grad is not None: 193 param._grad.data = fn(param._grad.data) /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in (t) 256 Module: self 257 """ --> 258 return self._apply(lambda t: t.cuda(device)) 259 260 def cpu(self): /opt/conda/lib/python3.6/site-packages/torch/cuda/__init__.py in _lazy_init() 159 raise RuntimeError( 160 "Cannot re-initialize CUDA in forked subprocess. " + msg) --> 161 _check_driver() 162 torch._C._cuda_init() 163 _cudart = _load_cudart() /opt/conda/lib/python3.6/site-packages/torch/cuda/__init__.py in _check_driver() 89 Alternatively, go to: https://pytorch.org to install 90 a PyTorch version that has been compiled with your version ---> 91 of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion()))) 92 93 AssertionError: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.
bruce-yang22 commented 5 years ago

Is it possible to add fastai v1 as the default version instead of 0.7? It's the version fastai community is actively developing and maintaining. The current competition https://www.kaggle.com/c/quora-insincere-questions-classification simply doesn't allow Internet connection, which means we can not pip install fastai v1 and pytorch 1.0 even if the package reloading after pip install issue gets fixed. fastai 0.7 just doesn't work with pytorch 0.4.1 for NLP tasks, as the embedding part in fit() will throw out NotImplementedError

rosbo commented 5 years ago

Hi @lordbruce,

pytorch 1.0 is still in "Preview" mode. We will wait until it becomes the "Stable" release before updating it within Kaggle Kernels. Thank you

mohanksriram commented 5 years ago

pytorch 1.0 is now "Stable", Any update on when we will switch to 1.0?

rosbo commented 5 years ago

We are working on it and should release a new image soon. Closing in favor of #397

KumarArindam commented 5 years ago

It has been a couple of days since the release of pytorch- v1 . How long exactly will it take for the issue to be resolved in kernels?

rosbo commented 5 years ago

We have already released the latest version of fastai on Dec 12th along with PyTorch v1.

KumarArindam commented 5 years ago

@rosbo please tell us the prcedure to downgrade the fast ai to 0.7 and pytorch to 0.4 on kaggle kernels. I have been lately trying to downgrade it but the kernel dies and restarts with the updated version. Please look into this matter.