PyTorch 0.4 compatibility

donovanr commented 6 years ago

I installed TC into my docker image with

conda install -y -c tensorcomp tensor_comprehensions

and my version of PyTorch was downgraded from 0.4 to 0.3.1.

Is there a way to avoid this?

My install info:

OS: ubuntu 16.04 (docker)
How you installed TC: conda
Python version: 3.6.4
CUDA/cuDNN version: CUDA 9.0.176 / cuDNN 7.1
Conda version: 4.5.4

seongwook-ham commented 6 years ago

if you want to use pytorch 0.4, you should build tc from source. see https://github.com/facebookresearch/TensorComprehensions/blob/master/docs/source/installation_docker_image.rst

donovanr commented 6 years ago

I tried building from source, and I get an error. I get slightly different errors depending on which version of the llvm toolchain I install, but they all have to do with disabled exceptions. I’m building inside a docker container (ubuntu 16.04); other perhaps relevant info: Python 3.6.4, CUDA 9.0.176 / cuDNN 7.1.

/opt/TensorComprehensions/third-party/halide/src/Error.cpp:121:9: error: cannot use 'throw' with exceptions disabled
        throw err;
        ^
/opt/TensorComprehensions/third-party/halide/src/Error.cpp:124:9: error: cannot use 'throw' with exceptions disabled
        throw err;
        ^
/opt/TensorComprehensions/third-party/halide/src/Error.cpp:127:9: error: cannot use 'throw' with exceptions disabled
        throw err;
        ^
3 errors generated.
make: *** [bin/build/Error.o] Error 1
make: *** Waiting for unfinished jobs....
../Makefile:850: recipe for target 'bin/build/Error.o' failed
/usr/bin/clang++-4.0 -Wall -Werror -Wno-unused-function -Wcast-qual -Wignored-qualifiers -Wno-comment -Wsign-compare -Wno-unknown-warning-option -Wno-psabi   -Woverloaded-virtual -fPIC -O3 -fno-omit-frame-pointer -DCOMPILING_HALIDE -std=c++11  -I/usr/lib/llvm-4.0/include -std=c++0x -gsplit-dwarf -fPIC -fvisibility-inlines-hidden -std=c++11 -ffunction-sections -fdata-sections -DNDEBUG -fno-exceptions -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DLLVM_VERSION=40  -DWITH_PTX=1  -DWITH_ARM=1  -DWITH_HEXAGON=1  -DWITH_AARCH64=1  -DWITH_X86=1        -DWITH_MIPS=1  -DWITH_POWERPC=1  -DWITH_INTROSPECTION  -DWITH_EXCEPTIONS  -DWITH_AMDGPU=1 -funwind-tables -c /opt/TensorComprehensions/third-party/halide/src/FindCalls.cpp -o bin/build/FindCalls.o -MMD -MP -MF bin/build/FindCalls.d -MT bin/build/FindCalls.o
make: *** wait: No child processes.  Stop.
The command '/bin/sh -c cd /opt/TensorComprehensions &&     BUILD_TYPE=Release INSTALL_PREFIX=$CONDA_PREFIX WITH_CAFFE2=OFF CLANG_PREFIX=$(llvm-config --prefix) ./build.sh --all' returned a non-zero code: 1

ftynse commented 6 years ago

Your compile invocation contains -fno-exceptions (along with -DWITH_EXCEPTIONS), so Halide submodule cannot be built. This is likely due to using an incompatible version of LLVM. Halide takes compilation flags from llvm-config. Most distributions ship LLVM built without exception support, which leads to this problem. Please follow these instructions to build LLVM with exception support https://facebookresearch.github.io/TensorComprehensions/installation_non_conda.html#step-3-install-clang-llvm If you build LLVM yourself, make sure -DLLVM_ENABLE_EH=ON is provided when calling LLVM's cmake.

nicolasvasilache commented 6 years ago

@donovanr @seongwook-ham @ftynse I am redoing our build system from scratch which should significantly improve things (and solve your issues as a byproduct). I'll push an experimental branch later today so you can try.

The catch is that I will be out until next Wed after that so hopefully it'll be usable enough or modifiable enough to get you unblocked.

donovanr commented 6 years ago

Thanks, those instructions pretty much worked. In case it’s useful to anyone else, I also had to install libxml2-dev and create a soft link from /clang+llvm-tapir5.0/bin/llvm-config to /usr/local/bin/llvm-config to get things to compile.

donovanr commented 6 years ago

@nicolasvasilache sorry, didn't see that before I commented -- that's great! let me know if you have something I can try out. I'd also be interested in knowing if there's some compilation detritus I can get rid of after the fact so that my docker image isn't quite so huge.

ftynse commented 6 years ago

and create a soft link from /clang+llvm-tapir5.0/bin/llvm-config to /usr/local/bin/llvm-config to get things to compile.

It's weird that there is no prefix before /clang+llvm-tapir5.0, there should be the contents of $HOME (either /home/<username> or /root normally). I don't know where your compiled LLVM got installed, but symlinks are rarely a good solution. TC build script itself uses CLANG_PREFIX environment variable to find the right version of llvm, it was expecting llvm-config to be in $CLANG_PREFIX/bin/. You might as well change CLANG_PREFIX to /usr/local instead of fiddling with the system.

Anyway, if it works for you, let's close the issue. Build system will be soon updated to allow us publishing pytorch-0.4-compatible binaries.

nicolasvasilache commented 6 years ago

@donovanr #451 was finally landed today, see BUILD.md for new build instructions. We are relying exclusively on conda from now on to get our external dependencies. This is going towards more future-proof integration with PyTorch as we will increasingly be using their conda packages even for our dev and CI builds. This is still quite limited in scope (we need to ship our own caffe2 for now) but hopefully in a near future we can just use pytorch only. Please let us know if you have questions.

facebookresearch / TensorComprehensions

PyTorch 0.4 compatibility #439