amazon-archives / amazon-dsstne

Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models
Apache License 2.0
4.41k stars 731 forks source link

Properly explicitly-instantiate templates inside CUDA files #171

Closed ldionne closed 6 years ago

ldionne commented 6 years ago

Simply calling the functions in another function will not necessarily make the symbols visible from outside the translation unit, since the compiler could for example perform inlining and never emit external symbols for those template instantiations.

Explicit instantiation of templates solves exactly that problem.

As a fly-by fix, this commit also removes the declaration of some function templates that were never defined.

I ran the Cifar-10 and the MovieLens examples and there does not seem to be a performance regression.

Before my changes

Training Movielens
real    0m6.850s
user    0m4.468s
sys 0m2.300s

Predicting Movielens
real    0m58.903s
user    0m54.644s
sys 0m4.064s

Training Cifar-10
real    1m14.164s
user    1m9.888s
sys 0m4.240s

After my changes

Training Movielens
real    0m6.890s
user    0m4.348s
sys 0m2.484s

Predicting Movielens
real    0m59.091s
user    0m54.996s
sys 0m3.884s

Training Cifar-10
real    1m14.239s
user    1m9.740s
sys 0m4.468s

The exact script I used to run those benchmarks follows, for reference:

#!/usr/bin/env bash

cd amazon-dsstne

echo "Building Dsstne..."
export PATH="/usr/local/openmpi/bin:/usr/local/cuda/bin:${PATH}"
INSTALL_DIR="${PWD}/src/amazon/dsstne"
make -C src/amazon/dsstne clean
make -C src/amazon/dsstne
g++ samples/cifar-10/dparse.cpp -o samples/cifar-10/dparse -lnetcdf -lnetcdf_c++4 --std=c++0x
export PATH="${INSTALL_DIR}/bin:${PATH}"

# Run Movielens
pushd samples/movielens
if [[ ! -e gl_input.nc ]]; then
  echo "Downloading MovieLens dataset..."
  wget --quiet http://files.grouplens.org/datasets/movielens/ml-20m.zip
  echo "Extracting ml-20/ratings.csv from ml-20m.zip to ml-20m_ratings.csv"
  unzip -p ml-20m.zip ml-20m/ratings.csv > ml-20m_ratings.csv
  echo "Converting ml-20m_ratings.csv to DSSTNE format"
  awk -f convert_ratings.awk ml-20m_ratings.csv > ml-20m_ratings
  generateNetCDF -d gl_input  -i ml-20m_ratings -o gl_input.nc  -f features_input  -s samples_input -c >/dev/null
  generateNetCDF -d gl_output -i ml-20m_ratings -o gl_output.nc -f features_output -s samples_input -c >/dev/null
else
  echo "Using existing gl_input.nc and gl_output.nc files"
fi

echo "Training Movielens"
time train -c config.json -i gl_input.nc -o gl_output.nc -n gl.nc -b 256 -e 1 >/dev/null

echo "Predicting Movielens"
time predict -b 256 -d gl -i features_input -o features_output -k 10 -n gl.nc -f ml-20m_ratings -s recs -r ml-20m_ratings >/dev/null
rm gl.nc recs
popd

# Run cifar-10
pushd samples/cifar-10
if [[ ! -e training.bin ]]; then
  echo "Downloading Cifar-10 dataset"
  wget --quiet https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz
  echo "Extracting Cifar-10 dataset"
  tar -xzf cifar-10-binary.tar.gz
  mv cifar-10-batches-bin/test_batch.bin test.bin
  cat cifar-10-batches-bin/data_batch_*.bin > training.bin
else
  echo "Using existing training.bin file"
fi
./dparse

echo "Training Cifar-10"
time train -c config.json -i cifar10_training.nc -o cifar10_test.nc -n result.nc -b 256 -e 1 >/dev/null
rm result.nc
popd