Cache-optimized variants in PrimeLua/solution_3

Mooshua commented 2 years ago

Description

Added several variants to the traditional linear sieve which iterate over the is_prime array in a way which reduces cache evictions. There are 8 new benchmarks, each with different parameters (which should cover most cpu-to-cpu divergence). So far I'm seeing at least a 2x improvement on most benchmarks!

From the README:

Cache optimized versions (ending in _c--k) use a non-linear sieve algorithm, which is broken into two steps:

A naive, linear sieve goes over the first sqrt(size) numbers, putting the ones that are prime into a list,

A non-linear sieve goes over every "block" of memory, and within each block sieves out the primes that stage 1 collected.

This reduces cache evictions as each block of memory is processed once, instead of sqrt(size) times like the standard linear sieves.

I've used the verification tools included in the solution, so to the best of my knowledge this new sieve type is fully correct. And yes, I've tested the dockerfile this time ;)

Results on an Intel i9-9900:

--   Cache-optimized: 20206
mooshua_luajit_16_c48k;20206;5.001;1;algorithm=base,faithful=no,bits=8
--   Linear: 8721
mooshua_luajit_16;8721;5.001;1;algorithm=base,faithful=no,bits=8

Contributing requirements

[x] I read the contribution guidelines in CONTRIBUTING.md.
[x] I placed my solution in the correct solution folder.
[x] I added a README.md with the right badge(s).
[x] I added a Dockerfile that builds and runs my solution.
[x] I selected drag-race as the target branch.
[x] All code herein is licensed compatible with BSD-3.

Mooshua commented 2 years ago

I should note, I'm not sure that this solution still qualifies as unfaithful.

The only disqualifying aspect (that I remember) is that it does not re-create a class structure for every iteration, and simply reuses the same configuration variables in a loop. (see for yourself)

Given this is the only concern, can I bump it up to faithful or do you folks think it's better to keep it as-is?

rbergen commented 1 year ago

The full build-up (and tear-down) of the memory structure ("class") used for the sieve/sieving is one of the most fundamental requirements for a faithful implementation, and has been that from the very beginning of the Primes project. In this case, the specific implementation aspect that goes against this rule most still is the reuse of the main sieve buffer (ARENA) across sieve runs.

Phrased differently, since the last time we discussed this the requirements for a faithful implementation haven't changed, and the relevant aspects of this solution's implementations haven't either. Therefore, the conclusion about this solution's unfaithfulness has to remain the same.

Mooshua commented 1 year ago

Yep, that was my reasoning when I first created the solution. Thanks for the clarification :)

PlummersSoftwareLLC / Primes

Cache-optimized variants in PrimeLua/solution_3 #865

Description

Contributing requirements