amazon-archives / amazon-dsstne

Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models
Apache License 2.0
4.41k stars 731 forks source link

build error due to legacy shuffle API #227

Open jeng1220 opened 4 years ago

jeng1220 commented 4 years ago

it threw:

nvcc -O3 -std=c++11 --compiler-options=-fPIC -use_fast_math --ptxas-options="-v" -gencode arch=compute_70,code=sm_70 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_30,code=sm_30 -DOMPI_SKIP_MPICXX --keep-dir /test/workspace/ctr/amazon-dsstne/build/tmp/engine/cuda -I/usr/local/include -isystem /usr/local/cuda/include -isystem /usr/lib/openmpi/include -isystem /usr/include/jsoncpp -IB40C -IB40C/KernelCommon -I/test/workspace/ctr/amazon-dsstne/build/include -I../utils  -c kernels.cu -o /test/workspace/ctr/amazon-dsstne/build/tmp/engine/cuda/kernels.o
ptxas /tmp/tmpxft_00000a17_00000000-8_kernels.compute_70.ptx, line 61962; error   : Instruction 'shfl' without '.sync' is not supported on .target sm_70 and higher from PTX ISA version 6.4

the old CUDA shuffle API have been deprecated in CUDA 9.0, and not available after CUDA 10 (link)

jeng1220 commented 4 years ago

after investigation, this problem was caused by old version of CUB, which referred by setup instructions (https://github.com/amzn/amazon-dsstne/blob/master/docs/getting_started/setup.md#cub-setup)

version 1.5.2 is too old for Tesla V100. could anyone update setup instructions?

scottlegrand commented 4 years ago

So first problem is that I do not believe anyone is left at Amazon to accept pull requests. But since you seem to work at NVIDIA and I do too, we could at least maintain my fork of it. I am surprised it's not working under CUDA 10 because I have had no issues building for Turing. But I thought we already handled this correctly because it was an issue as of CUDA 9? Also where is the CUB version in the source? I am betting my local version is newer than 1.5.2 and that might have obscured this for some time.

jeng1220 commented 4 years ago

this line also uses old shuffle API, which needs to be fixed https://github.com/amzn/amazon-dsstne/blob/master/src/amazon/dsstne/knn/topk.cu#L141