NVIDIA / cccl

CUDA Core Compute Libraries
Other
1.12k stars 130 forks source link

thrust::host_vector allocation is slow compared to std::vector #775

Open mirzadeh opened 3 years ago

mirzadeh commented 3 years ago

I was trying to debug a performance issue and it lead me to testing the memory allocation of thrust::host_vector. Here's a simple benchmark I was running:

#include <chrono>
#include <iostream>
#include <vector>
#include <thrust/host_vector.h>

const int length = 64'000'000;

template<typename HostVector>
void benchmark(const int num_repeats) {
    std::vector<HostVector> buffers;

    auto start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < num_repeats; i++) {
        buffers.emplace_back(length);
    }
    auto finish = std::chrono::high_resolution_clock::now();

    std::cout << "took " << std::chrono::duration_cast<std::chrono::milliseconds>(finish - start).count() / (float)num_repeats << "ms\n";
}

int main () {
    std::cout << "std::vector: ";
    benchmark<std::vector<char>>(100);

    std::cout << "thrust::host_vector: ";
    benchmark<thrust::host_vector<char>>(100);
}

on my machine I get:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

$ nvcc -O2 -std=c++17  mem.cu && ./a.out
std::vector: took 15.27ms
thrust::host_vector: took 61ms

I was wondering if I am missing something trivial or perhaps this is a known issue? Thanks!

alliepiper commented 3 years ago

I was wondering if I am missing something trivial or perhaps this is a known issue? Thanks!

Thanks for the report. I hadn't noticed this or had it reported before. I don't know of any workarounds other than to use reserve to reduce dynamic allocations.