The challenge is simple, but implementing it in CUDA was not. Here I will share my solution that runs in 16.8 seconds on a V100. It’s certainly not the fastest solution, but it is the first one of its kind (no cudf, hand-written kernels only). I challenge other CUDA enthusiasts to make it faster.
I hope AMD will accept that challenge and offer an HIP-based solution to this problem :)
Dear ROCm team,
I just stumbled upon that article about "The One Billion Row Challenge":
https://tspeterkim.github.io/posts/cuda-1brc
I hope AMD will accept that challenge and offer an HIP-based solution to this problem :)
https://1brc.dev/
Best regards,
Samuel