-
We should develop some helper functions for parallelization and then use it in several places.
There are several comments in the code indicating where we could parallelize things:
```
$ grep -r p…
-
This is a follow-on for #5513
It's not exactly a bug but we need to evaluate the return type consistency of of the parallel algorithms especially when outputting to types smaller than the input typ…
-
Several routines in Elemental (e.g., sorting eigenpairs and singular triplets, and computing medians) requires sorting distributed data structures. It would be worthwhile for Elemental to incorporate …
-
Here is the list of all parallel algorithms. We plan to enable most (if not all) to support `datapar`, `datapar(task)` and variations.
- [x] `adjacent_difference` (#5580)
- [x] `inner_product` (see …
-
**Motivation**
It's increasingly harder to reach SOL on newer GPU architectures, starting with A100 and H100, especially for simple kernels, like:
`thrust::transform(..., thrust::plus{})`, which ba…
-
Thanks for your presentation at MEMPANG24!
Here are some alternative sorting algorithms which might be interesting to try.
In-place Parallel Super Scalar Samplesort (IPS⁴o): https://github.com/S…
-
[P0350R2](https://wg21.link/p0350r2) Integrating simd with parallel algorithms (Matthias Kretz)
-
As I understand, most (?) of the benchmarks are multi-threaded.
However, I think it would be very useful to see single-threaded performance too. There are several reasons for that:
- most people p…
-
https://dl.acm.org/doi/pdf/10.1145/2806416.2806545
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.300.2643&rep=rep1&type=pdf
-
# Concepts for Efficient Coding
## Course Description
This course aims to provide an intuitive and deeper understanding of what happens when you run code. By focusing on the underlying computer ar…