Running on older GPU - Githubissues

bmaltais commented 6 years ago

Great work! I am trying to install the application and I have been running into issue because my GPU is too old for the latest putorch version.

What version of python and pytorch are you using?

I am starting from this docker image built with the Dockerfile containing:

FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu14.04
ENV MINICONDA /opt/miniconda
ENV PATH ${MINICONDA}/bin:$PATH
RUN apt-get update && apt-get install -y wget git
RUN wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -P /tmp
RUN bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p $MINICONDA
RUN rm /tmp/Miniconda3-latest-Linux-x86_64.sh
RUN conda install -y pytorch=0.3.0.0 torchvision -c pytorch
RUN mkdir /app 
WORKDIR /app
RUN git clone https://github.com/ProGamerGov/neural-style-pt.git 
WORKDIR /app/neural-style-pt
RUN python models/download_models.py

but I get this error when I run : python neural_style.py -gpu 0 -backend cudnn -print_iter 1

VGG-19 Architecture Detected
Successfully loaded models/vgg19-d01eb7cb.pth
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
Setting up style layer 2: relu1_1
Setting up style layer 7: relu2_1
Setting up style layer 12: relu3_1
Setting up style layer 21: relu4_1
Setting up content layer 23: relu4_2
Setting up style layer 30: relu5_1
Capturing content targets
nn.Sequential (
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output]
  (1): nn.TVLoss
  (2): nn.Conv2d (3 -> 64, 3x3, 1,1, 1,1)
  (3): nn.ReLU
  (4): nn.StyleLoss
  (5): nn.Conv2d (64 -> 64, 3x3, 1,1, 1,1)
  (6): nn.ReLU
  (7): nn.MaxPool2d(2x2, 2,2)
  (8): nn.Conv2d (64 -> 128, 3x3, 1,1, 1,1)
  (9): nn.ReLU
  (10): nn.StyleLoss
  (11): nn.Conv2d (128 -> 128, 3x3, 1,1, 1,1)
  (12): nn.ReLU
  (13): nn.MaxPool2d(2x2, 2,2)
  (14): nn.Conv2d (128 -> 256, 3x3, 1,1, 1,1)
  (15): nn.ReLU
  (16): nn.StyleLoss
  (17): nn.Conv2d (256 -> 256, 3x3, 1,1, 1,1)
  (18): nn.ReLU
  (19): nn.Conv2d (256 -> 256, 3x3, 1,1, 1,1)
  (20): nn.ReLU
  (21): nn.Conv2d (256 -> 256, 3x3, 1,1, 1,1)
  (22): nn.ReLU
  (23): nn.MaxPool2d(2x2, 2,2)
  (24): nn.Conv2d (256 -> 512, 3x3, 1,1, 1,1)
  (25): nn.ReLU
  (26): nn.StyleLoss
  (27): nn.Conv2d (512 -> 512, 3x3, 1,1, 1,1)
  (28): nn.ReLU
  (29): nn.ContentLoss
  (30): nn.Conv2d (512 -> 512, 3x3, 1,1, 1,1)
  (31): nn.ReLU
  (32): nn.Conv2d (512 -> 512, 3x3, 1,1, 1,1)
  (33): nn.ReLU
  (34): nn.MaxPool2d(2x2, 2,2)
  (35): nn.Conv2d (512 -> 512, 3x3, 1,1, 1,1)
  (36): nn.ReLU
  (37): nn.StyleLoss
)
Traceback (most recent call last):
  File "neural_style.py", line 409, in <module>
    main()
  File "neural_style.py", line 149, in main
    net(content_image)
  File "/opt/miniconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/miniconda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
    input = module(input)
  File "/opt/miniconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/miniconda/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 277, in forward
    self.padding, self.dilation, self.groups)
  File "/opt/miniconda/lib/python3.6/site-packages/torch/nn/functional.py", line 90, in conv2d
    return f(input, weight, bias)
TypeError: argument 0 is not a Variable

Any idea? Is there a hard dependency on cuda 9.1 cudnn 7.1?

bmaltais commented 6 years ago

Trying to build a docker container. If I get it working I will share for others so they don't have to go through the pain of the manual setup ;-)

ProGamerGov commented 6 years ago

Great work! I am trying to install the application and I have been running into issue

TypeError: argument 0 is not a Variable

@bmaltais Thanks! I think the issue is that you are using an outdated version of PyTorch, v0.3. In version 0.4, the Tensor and Variable classes were merged together, and that's what the code was tested with. When 0.4 was released, I removed references to Variable, and started using .item(), which

Any idea? Is there a hard dependency on cuda 9.1 cudnn 7.1?

Any CUDA or cuDNN version that PyTorch supports, should work.

As for Python, I have tested and made sure that all the Python scripts can run with both Python 2 and 3.

I have been running into issue because my GPU is too old for the latest pytorch version.

As per the installation guide, you will likely need to install from source in order to use your GPU:

Note that in order to reduce their size, the pre-packaged binary releases (pip, Conda, etc...) have removed support for some older GPUs, and thus you will have to install from source in order to use these GPUs.

Once you you have installed from source, I have found that you can run python setup.py install (or possibly with python3) if the GPU has changed, and the installation will occur a lot more quickly. This will also make sure that the appropriate GPU binaries are used. Though I have only tested this with different AWS instances.

Also, you don't have to install the Torchvision package from source. You can likely just use pip or Conda.

bmaltais commented 6 years ago

It worked from source. In case others are interested to build a container that is ready to run neural-style-pt on a system with an older NVIDIA GPU you can use this dockerfile to build it:

FROM nvidia/cuda:8.0-cudnn7-devel-ubuntu16.04
ENV ANACONDA /opt/anaconda2
ENV CUDA_PATH /usr/local/cuda
ENV PATH ${ANACONDA}/bin:${CUDA_PATH}/bin:$PATH
ENV LD_LIBRARY_PATH ${ANACONDA}/lib:${CUDA_PATH}/bin64:$LD_LIBRARY_PATH
ENV C_INCLUDE_PATH ${CUDA_PATH}/include
ENV CMAKE_PREFIX_PATH ${ANACONDA}/

RUN apt-get update && \
    apt-get install -y wget build-essential git && \
    apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

RUN wget https://repo.continuum.io/archive/Anaconda3-5.2.0-Linux-x86_64.sh -P /tmp && \
    bash /tmp/Anaconda3-5.2.0-Linux-x86_64.sh -b -p $ANACONDA && \
    rm /tmp/Anaconda3-5.2.0-Linux-x86_64.sh -rf

# Install basic dependencies
RUN conda install -y numpy pyyaml mkl mkl-include setuptools cmake cffi typing && \
    conda install -y -c mingfeima mkldnn && \
    conda install -y -c pytorch magma-cuda80 \
    && conda clean -ya

# Build pytorch and vision from code
RUN mkdir /app && \
    cd /app && \
    git clone --recursive https://github.com/pytorch/pytorch && \
    cd /app/pytorch && \
    python setup.py install && \
    cd /app && \
    git clone --recursive https://github.com/pytorch/vision && \
    cd /app/vision && \
    python setup.py install

build the docker container with:

docker build -t pytprch-cuda8.0-cudnn7-devel-ubuntu16.04 .

It will take about an hour to put everything together if you have a fast internet connection.

Here is the next Dockerfile to actually install neural-style-pt:

FROM pytprch-cuda8.0-cudnn7-devel-ubuntu16.04

RUN cd /app && \
    git clone https://github.com/ProGamerGov/neural-style-pt.git && \
    cd /app/neural-style-pt && \
    python models/download_models.py

WORKDIR /app/neural-style-pt

build the docker container with:

docker build -t neural-style-pt .

Then run the code with:

nvidia-docker run -it neural-style-pt

ProGamerGov / neural-style-pt

Running on older GPU #1