ctongfei / nexus

Experimental tensor-typed deep learning
https://tongfei.me/nexus/
MIT License
257 stars 15 forks source link

`fatal error: cuda_runtime_api.h` (linux) #27

Closed danyaljj closed 5 years ago

danyaljj commented 5 years ago

Similar error to #24 (identical?)

$ ./build.sh 
PyTorch 1.0.0 at /shared/shelley/khashab2/nexus/env3.6/lib/python3.6/site-packages/torch detected
Operating system is linux-gnu
Copying include files...
Preprocessing all header files for SWIG to parse...
Preprocessing C headers...
In file included from torch.h:1:0:
TH/TH.h:4:26: fatal error: TH/THGeneral.h: No such file or directory
compilation terminated.
Generating SWIG bindings...
Language subdirectory: java
Search paths:
   ./
   ./swig_lib/java/
   /usr/share/swig3.0/java/
   ./swig_lib/
   /usr/share/swig3.0/
Preprocessing...
Starting language-specific parse...
Processing types...
C++ analysis...
Processing nested classes...
Generating wrappers...
Compiling SWIG generated JNI wrapper code...
Compiling using Java: /usr/lib/jvm/java-8-openjdk-amd64
In file included from /shared/shelley/khashab2/nexus/env3.6/lib/python3.6/site-packages/torch/lib/include/THC/THCGeneral.h:12:0,
                 from /shared/shelley/khashab2/nexus/env3.6/lib/python3.6/site-packages/torch/lib/include/THC/THC.h:4,
                 from torch_wrap_fixed.cxx:236:
/shared/shelley/khashab2/nexus/env3.6/lib/python3.6/site-packages/torch/lib/include/ATen/cuda/CUDAStream.h:6:30: fatal error: cuda_runtime_api.h: No such file or directory
compilation terminated.

Btw, I have cuda installed on my machine:

$ cuda
cuda                         cudafe                       cudafe++                     cuda-gdb                     cuda-gdbserver               cuda-install-samples-9.1.sh  cuda-memcheck   
ctongfei commented 5 years ago

Very strange. Could you confirm that you can find TH/THGeneral.h in /shared/shelley/khashab2/nexus/env3.6/lib/python3.6/site-packages/torch/lib ?

danyaljj commented 5 years ago

There is one under torch/lib/include/TH/THGeneral.h:

(env3.6) khashab2@gissing:/shared/shelley/khashab2/nexus/env3.6/lib/python3.6/site-packages/torch/lib$ tree -L 3
.
├── include
│   ├── ATen
│   │   ├── AccumulateType.h
│   │   ├── Allocator.h
│   │   ├── ArrayRef.h
│   │   ├── ATen.h
│   │   ├── Backend.h
│   │   ├── Backtrace.h
│   │   ├── CheckGenerator.h
│   │   ├── Config.h
│   │   ├── Context.h
│   │   ├── core
│   │   ├── cpu
│   │   ├── CPUApplyUtils.h
│   │   ├── CPUByteType.h
│   │   ├── CPUCharType.h
│   │   ├── CPUDoubleType.h
│   │   ├── CPUFixedAllocator.h
│   │   ├── CPUFloatType.h
│   │   ├── CPUGeneral.h
│   │   ├── CPUGenerator.h
│   │   ├── CPUHalfType.h
│   │   ├── CPUIntType.h
│   │   ├── CPULongType.h
│   │   ├── CPUShortType.h
│   │   ├── CPUTypeDefault.h
│   │   ├── cuda
│   │   ├── CUDAByteType.h
│   │   ├── CUDACharType.h
│   │   ├── CUDADoubleType.h
│   │   ├── CUDAFloatType.h
│   │   ├── CUDAGenerator.h
│   │   ├── CUDAHalfType.h
│   │   ├── CUDAIntType.h
│   │   ├── CUDALongType.h
│   │   ├── CUDAShortType.h
│   │   ├── cudnn
│   │   ├── detail
│   │   ├── DeviceGuard.h
│   │   ├── Device.h
│   │   ├── DimVector.h
│   │   ├── Dispatch.h
│   │   ├── div_rtn.h
│   │   ├── DLConvertor.h
│   │   ├── dlpack.h
│   │   ├── ExpandUtils.h
│   │   ├── Formatting.h
│   │   ├── Functions.h
│   │   ├── Generator.h
│   │   ├── Half.h
│   │   ├── InferSize.h
│   │   ├── InitialTensorOptions.h
│   │   ├── Layout.h
│   │   ├── LegacyTHDispatcher.h
│   │   ├── LegacyTHDispatch.h
│   │   ├── MatrixRef.h
│   │   ├── NativeFunctions.h
│   │   ├── Parallel.h
│   │   ├── RegisterCPU.h
│   │   ├── RegisterCUDA.h
│   │   ├── Scalar.h
│   │   ├── ScalarOps.h
│   │   ├── ScalarType.h
│   │   ├── SmallVector.h
│   │   ├── SparseCPUByteType.h
│   │   ├── SparseCPUCharType.h
│   │   ├── SparseCPUDoubleType.h
│   │   ├── SparseCPUFloatType.h
│   │   ├── SparseCPUIntType.h
│   │   ├── SparseCPULongType.h
│   │   ├── SparseCPUShortType.h
│   │   ├── SparseCUDAByteType.h
│   │   ├── SparseCUDACharType.h
│   │   ├── SparseCUDADoubleType.h
│   │   ├── SparseCUDAFloatType.h
│   │   ├── SparseCUDAIntType.h
│   │   ├── SparseCUDALongType.h
│   │   ├── SparseCUDAShortType.h
│   │   ├── SparseTensorImpl.h
│   │   ├── SparseTensorUtils.h
│   │   ├── Storage.h
│   │   ├── TensorAccessor.h
│   │   ├── TensorGeometry.h
│   │   ├── Tensor.h
│   │   ├── TensorOperators.h
│   │   ├── TensorOptions.h
│   │   ├── TensorUtils.h
│   │   ├── TypeDefault.h
│   │   ├── TypeExtendedInterface.h
│   │   ├── Type.h
│   │   ├── UndefinedType.h
│   │   ├── Utils.h
│   │   ├── WrapDimUtils.h
│   │   └── WrapDimUtilsMulti.h
│   ├── c10
│   │   ├── core
│   │   ├── cuda
│   │   ├── DeviceGuard.h
│   │   ├── Device.h
│   │   ├── DeviceType.h
│   │   ├── Half.h
│   │   ├── Half-inl.h
│   │   ├── impl
│   │   ├── macros
│   │   ├── StreamGuard.h
│   │   ├── Stream.h
│   │   └── util
│   ├── caffe2
│   │   ├── core
│   │   ├── proto
│   │   └── utils
│   ├── pybind11
│   │   ├── attr.h
│   │   ├── buffer_info.h
│   │   ├── cast.h
│   │   ├── chrono.h
│   │   ├── common.h
│   │   ├── complex.h
│   │   ├── detail
│   │   ├── eigen.h
│   │   ├── embed.h
│   │   ├── eval.h
│   │   ├── functional.h
│   │   ├── iostream.h
│   │   ├── numpy.h
│   │   ├── operators.h
│   │   ├── options.h
│   │   ├── pybind11.h
│   │   ├── pytypes.h
│   │   ├── stl_bind.h
│   │   └── stl.h
│   ├── TH
│   │   ├── generic
│   │   ├── THAllocator.h
│   │   ├── THBlas.h
│   │   ├── THDiskFile.h
│   │   ├── THFile.h
│   │   ├── THFilePrivate.h
│   │   ├── THGeneral.h
│   │   ├── THGenerateAllTypes.h
│   │   ├── THGenerateByteType.h
│   │   ├── THGenerateCharType.h
│   │   ├── THGenerateDoubleType.h
│   │   ├── THGenerateFloatType.h
│   │   ├── THGenerateFloatTypes.h
│   │   ├── THGenerateHalfType.h
│   │   ├── THGenerateIntType.h
│   │   ├── THGenerateIntTypes.h
│   │   ├── THGenerateLongType.h
│   │   ├── THGenerateShortType.h
│   │   ├── THGenerator.hpp
│   │   ├── TH.h
│   │   ├── THHalf.h
│   │   ├── THLapack.h
│   │   ├── THLogAdd.h
│   │   ├── THMath.h
│   │   ├── THMemoryFile.h
│   │   ├── THRandom.h
│   │   ├── THSize.h
│   │   ├── THStorageFunctions.h
│   │   ├── THStorageFunctions.hpp
│   │   ├── THStorage.h
│   │   ├── THTensorApply.h
│   │   ├── THTensorDimApply.h
│   │   ├── THTensor.h
│   │   ├── THTensor.hpp
│   │   └── THVector.h
│   ├── THC
│   │   ├── generic
│   │   ├── THCAllocator.h
│   │   ├── THCApply.cuh
│   │   ├── THCAsmUtils.cuh
│   │   ├── THCAtomics.cuh
│   │   ├── THCBlas.h
│   │   ├── THCCachingAllocator.h
│   │   ├── THCCachingHostAllocator.h
│   │   ├── THCDeviceTensor.cuh
│   │   ├── THCDeviceTensor-inl.cuh
│   │   ├── THCDeviceTensorUtils.cuh
│   │   ├── THCDeviceTensorUtils-inl.cuh
│   │   ├── THCDeviceUtils.cuh
│   │   ├── THCGeneral.h
│   │   ├── THCGeneral.hpp
│   │   ├── THCGenerateAllTypes.h
│   │   ├── THCGenerateByteType.h
│   │   ├── THCGenerateCharType.h
│   │   ├── THCGenerateDoubleType.h
│   │   ├── THCGenerateFloatType.h
│   │   ├── THCGenerateFloatTypes.h
│   │   ├── THCGenerateHalfType.h
│   │   ├── THCGenerateIntType.h
│   │   ├── THCGenerateLongType.h
│   │   ├── THCGenerateShortType.h
│   │   ├── THCGenerator.hpp
│   │   ├── THC.h
│   │   ├── THCIntegerDivider.cuh
│   │   ├── THCNumerics.cuh
│   │   ├── THCReduceAll.cuh
│   │   ├── THCReduceApplyUtils.cuh
│   │   ├── THCReduce.cuh
│   │   ├── THCScanUtils.cuh
│   │   ├── THCSleep.h
│   │   ├── THCSortUtils.cuh
│   │   ├── THCStorageCopy.h
│   │   ├── THCStorage.h
│   │   ├── THCStorage.hpp
│   │   ├── THCTensorCopy.h
│   │   ├── THCTensorCopy.hpp
│   │   ├── THCTensor.h
│   │   ├── THCTensor.hpp
│   │   ├── THCTensorInfo.cuh
│   │   ├── THCTensorMath.h
│   │   ├── THCTensorMathMagma.cuh
│   │   ├── THCTensorMathPointwise.cuh
│   │   ├── THCTensorMathReduce.cuh
│   │   ├── THCTensorMode.cuh
│   │   ├── THCTensorRandom.cuh
│   │   ├── THCTensorRandom.h
│   │   ├── THCTensorSort.cuh
│   │   ├── THCTensorTopK.cuh
│   │   ├── THCTensorTypeUtils.cuh
│   │   └── THCThrustAllocator.cuh
│   ├── THCUNN
│   │   ├── SharedMem.cuh
│   │   └── THCHalfAutoNumerics.cuh
│   └── torch
│       ├── csrc
│       ├── extension.h
│       └── script.h
├── libc10_cuda.so
├── libc10.so
├── libcaffe2_detectron_ops_gpu.so
├── libcaffe2_gpu.so
├── libcaffe2_module_test_dynamic.so
├── libcaffe2_observers.so
├── libcaffe2.so
├── libcudart-f7fdd8d7.so.9.0
├── libgomp-7bcb08ae.so.1
├── libmkldnn.so
├── libmkldnn.so.0
├── libmkldnn.so.0.14.0
├── libnvrtc-007d19c9.so.9.0
├── libnvrtc-builtins.so
├── libnvToolsExt-3965bdd0.so.1
├── libonnxifi_dummy.so
├── libonnxifi.so
├── libshm.so
├── libtorch_python.so
├── libtorch.so
├── libtorch.so.1
├── THCUNN.h
├── THNN.h
└── torch_shm_manager
ctongfei commented 5 years ago

Seems identical to #24 . Try going into the include-swig directory and directly run this line: https://github.com/ctongfei/nexus/blob/master/torch/build.sh#L71 (without all these post-processing stuff to make SWIG work)

g++ -P -E -I TH -I THNN -I THC -I THCUNN torch.h
danyaljj commented 5 years ago

Here is what I get:

(env3.6) khashab2@gissing:/shared/shelley/khashab2/nexus/torch/include-swig$ g++ -P -E -I TH -I THNN -I THC -I THCUNN torch.h
In file included from torch.h:1:0:
TH/TH.h:4:26: fatal error: TH/THGeneral.h: No such file or directory
compilation terminated.
ctongfei commented 5 years ago

Hmm... it cannot find that file. Try adding -I . (current working directory since it seems that TH/THGeneral.h is located here) to g++ -P -E? I cannot reproduce this

ctongfei commented 5 years ago

Additionally, cuda_runtime_api.h should be found at $CUDA_ROOT/include/. Mine is located there.

bitstormFA commented 5 years ago

Hi - I had the same error, adding -I . to the g++ call fixed it for me (running a current arch linux with gcc 8.2.1, cuda 10.0)

ctongfei commented 5 years ago

@bitstormGER Thanks for confirming my hypothetical fix about the header problem. It should be fixed now?

danyaljj commented 5 years ago

I'll give it a try tonight.

ctongfei commented 5 years ago

@danyaljj Now refactored into a Makefile. This should make it easier to debug (you'll know which step it failed at)

danyaljj commented 5 years ago

Make finished with no errors. Looks like it's all good! 🎆

The tail of the output, for completeness:

torch-preprocessed.h:559: Warning 315: Nothing known about 'at::Allocator'.
torch-preprocessed.h:559: Warning 315: Nothing known about 'at::Allocator'.
torch-preprocessed.h:559: Warning 315: Nothing known about 'at::Allocator'.
torch-preprocessed.h:559: Warning 315: Nothing known about 'at::Allocator'.
torch-preprocessed.h:559: Warning 315: Nothing known about 'at::Allocator'.
torch-preprocessed.h:559: Warning 315: Nothing known about 'at::Allocator'.
cat torch_wrap.cxx \
      | python fix_cuda_stream_dereferencing.py \
      > torch_wrap_fixed.cxx
Problematic definition found.
Fixed.
Problematic definition found.
Fixed.
g++ -std=c++11 -fPIC -static -c torch_wrap_fixed.cxx \
-I /usr/lib/jvm/java-8-openjdk-amd64/include \
-I /usr/lib/jvm/java-8-openjdk-amd64/include/linux \
-I /usr/local/cuda/include \
-I /shared/shelley/khashab2/nexus/env3.6/lib/python3.6/site-packages/torch/lib \
    -I /shared/shelley/khashab2/nexus/env3.6/lib/python3.6/site-packages/torch/lib/include \
    -I /shared/shelley/khashab2/nexus/env3.6/lib/python3.6/site-packages/torch/lib/include/TH \
    -I /shared/shelley/khashab2/nexus/env3.6/lib/python3.6/site-packages/torch/lib/include/THC
mkdir -p jni/src/main/resources;
g++ -shared -fPIC torch_wrap_fixed.o -o jni/src/main/resources/libjnitorch.so -L /shared/shelley/khashab2/nexus/env3.6/lib/python3.6/site-packages/torch/lib -l caffe2 -l caffe2_gpu
(env3.6) khashab2@gissing:/shared/shelley/khashab2/nexus/torch$ 
ctongfei commented 5 years ago

Great! You should be able to run stuff now, and contribute :-)