Open Nugine opened 1 year ago
Parallel threshold
$$ \frac{n}{pv}+c<\frac{n}{v} $$
$$ n > \frac{p}{p-1}cv $$
$n$: input size (B) $p$: num threads $v$: throughput (B/ns) $c$: threading overhead (ns)
Parallel time rate (smaller is better)
$$ \frac{n/(pv)+c}{n/v} < 1 $$
$$ \frac{1}{p}+\frac{cv}{n} < 1 $$
When handling large input, the throughput may be limited by cache miss and page faults. In other words, the throughput hits memory bound. So the theoretical parallel time rate is smaller than the practial one.
just bench static-experimental --bench base64 --plotting-backend disabled -- 'base64-encode/base64-simd'
16 | 32 | 64 | 256 | 1024 | 4096 | 65536 | 262144 | 524288 | 1048576 | |
---|---|---|---|---|---|---|---|---|---|---|
base64-simd/auto | 1.827 | 1.977 | 3.503 | 8.008 | 11.826 | 12.980 | 13.494 | 12.852 | 12.751 | 12.723 |
base64-simd/parallel | 1.174 | 0.004 | 0.008 | 0.029 | 0.114 | 0.433 | 5.623 | 16.196 | 24.001 | 31.267 |
Would it be useful to have option to parse multiple items, which the cache and instruction level parallelism can probably help here.
Use case will be something like being able to use it in polars to read a column of uuid.
Would it be useful to have option to parse multiple items, which the cache and instruction level parallelism can probably help here.
Use case will be something like being able to use it in polars to read a column of uuid.
Could you explain this use case in more detail? How can we accelerate it by multithreading?
Improve throughput by using multi-threading (rayon).
In most cases, the simd functions are fast enough that we may not benefit from multi-threading. And some functions are not divisible.
Crates with parallel api: