fplll / g6k

The General Sieve Kernel
GNU General Public License v2.0
103 stars 31 forks source link

Parallel #39

Closed cr-marcstevens closed 4 years ago

cr-marcstevens commented 4 years ago

I have added kernel/parallel_algorithms.hpp from https://github.com/cr-marcstevens/snippets, and this is used in several places to simplify and improve multithreaded operations. Notably:

Other improvement:

This reduces 'overhead walltime' by almost a factor 1.8 for larger dims, e.g.: python ./full_sieve.py 110 --saturation-ratio 0 --threads 80 before: cputime 6631.2597s, walltime: 325.0279s after : cputime 6764.1946s, walltime: 182.2982s

cr-marcstevens commented 4 years ago

BTW this also uses bgj1_cdb_copy so this will increase memory. If wanted we could first sort an (idx,len) copy of cdb and then do inplace swaps to save memory at the cost of more work.

cr-marcstevens commented 4 years ago

For sieve dim 126 about 4.2%/8.8GiB increase in memory usage, and improved 'overhead' walltime from 64min to 34min.

Before:

$ /usr/bin/time -v python ./full_sieve.py 126 --saturation-ratio 0 --threads 80 --verbose
  51: ↑ 51 ↓  1  T:   0.41623s, TT:   0.41629s, q:  21.32605 r0/gh:  15.51283
  66: ↑ 66 ↓  1  T:   0.93737s, TT:   1.35368s, q:  11.54843 r0/gh:  15.51283
  81: ↑ 81 ↓  1  T:   3.99299s, TT:   5.34668s, q:   6.30592 r0/gh:  15.51283
  96: ↑ 96 ↓  1  T:  33.88113s, TT:  39.22783s, q:   3.44411 r0/gh:  15.51283
 111: ↑111 ↓  1  T: 343.71957s, TT: 382.94742s, q:   1.84386 r0/gh:  15.51283
 126: ↑126 ↓  1  T:3415.11327s, TT:3798.06071s, q:   1.00000 r0/gh:  15.51283
root: 'saturation_ratio': 0, 'threads': 80,              :: n: 126, cputime 73164.6877s, walltime: 3798.0610s, |db|: 2^27.83
User time (seconds): 39469.68
System time (seconds): 33767.70
Percent of CPU this job got: 1893%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:04:28
Maximum resident set size (kbytes): 223379504

After:

$ /usr/bin/time -v python ./full_sieve.py 126 --saturation-ratio 0 --threads 80 --verbose
  51: ↑ 51 ↓  1   T:   0.55681s, TT:   0.55685s, q:  21.32605 r0/gh:  15.51283
  66: ↑ 66 ↓  1   T:   0.89208s, TT:   1.44895s, q:  11.54843 r0/gh:  15.51283
  81: ↑ 81 ↓  1   T:   3.31932s, TT:   4.76832s, q:   6.30592 r0/gh:  15.51283
  96: ↑ 96 ↓  1   T:  21.81795s, TT:  26.58629s, q:   3.44411 r0/gh:  15.51283
 111: ↑111 ↓  1   T: 196.64891s, TT: 223.23521s, q:   1.84386 r0/gh:  15.51283
 126: ↑126 ↓  1   T:1812.27795s, TT:2035.51319s, q:   1.00000 r0/gh:  15.51283
root: 'saturation_ratio': 0, 'threads': 80,              :: n: 126, cputime 75764.4930s, walltime: 2035.5135s, |db|: 2^27.83
User time (seconds): 40226.66
System time (seconds): 35560.10
Percent of CPU this job got: 3691%
Elapsed (wall clock) time (h:mm:ss or m:ss): 34:13.01
Maximum resident set size (kbytes): 232819984
lducas commented 4 years ago

Impressive. Any specific aspect that should require review ?

cr-marcstevens commented 4 years ago

a) that I didn't screw up some non-default sieve or other use-case b) if the 4.2% memory increase is worth it or if it's important to try to find a better tradeoff c) other stuff I missed that could be improved (e.g. I looked at the triple_sieve_mt.cpp resort function that probably could benefit as well. But its rather complicated, so I'd rather prefer someone who knows what's going on there helps rewriting it)

lducas commented 4 years ago

I rebuild without "-f". I get issues when pump-down is on, even when single-threaded, and it seems independent of the kind of default-sieve used.

sieving.cpp:97: void Siever::gauss_sieve(std::size_t): Assertion 'cv2_vec_len > fast_cdb[j].len' failed. Traceback (most recent call last): File "./svp_challenge.py", line 114, in <module> asvp() File "./svp_challenge.py", line 92, in asvp stats = run_all(asvp_kernel, list(all_params.values()), File "/home/ducas/code/g6k/g6k/utils/cli.py", line 113, in run_all res = f(copy.deepcopy(job)) File "./svp_challenge.py", line 63, in asvp_kernel flast = workout(g6k, tracer, 0, n, goal_r0=goal_r0, File "/home/ducas/code/g6k/g6k/algorithms/workout.py", line 57, in workout pump(g6k, tracer, kappa, blocksize, f, goal_r0=goal_r0, **pump_params) File "/home/ducas/code/g6k/g6k/algorithms/pump.py", line 157, in pump elif not wrapped_sieve(pump): File "/home/ducas/code/g6k/g6k/algorithms/pump.py", line 34, in wrapped_sieve pump.g6k(alg=alg, tracer=pump.tracer) File "g6k/siever.pyx", line 1267, in g6k.siever.Siever.__call__ with tracer.context("gauss"): File "g6k/siever.pyx", line 1268, in g6k.siever.Siever.__call__ self.gauss_sieve(reset_stats=reset_stats) File "g6k/siever.pyx", line 1241, in g6k.siever.Siever.gauss_sieve sig_on()

This was while running ./svp_challenge.py 70 --pump/down-sieve 1 --verbose --threads 4. The issue seems to always occur for very small sieve context, when we fallback to regular gauss_sieve, which may explain why the issue is independent of the choice of the sieve.

A somewhat systematic test would be: ./full_sieve.py --sieve gauss nv bgj1 gauss_triple_mt --threads 1 4 --challenge-seed 0~4 --verbose and ./svp_challenge.py 80 --sieve gauss nv bgj1 gauss_triple_mt --pump/down-sieve 0 1 --threads 1 4 --challenge-seed 0~4 --verbose

cr-marcstevens commented 4 years ago

Hi Leo, I'm running your tests now. Note that I had similar issues (hangs), which I traced back to g6k not being properly rebuild: i.e., it was mixing old and new object files. This was caused on halfpeta:

  1. as Cython does not create g6k/siever.so but g6k/siever.cpython<ver>.so
  2. python setup.py clean does not exactly what we want similar to make clean which cleans all build files, but python only removes temporary build files.

I have put the fix for this in PR #38.

To make sure this is not the cause, can you also change rebuild.sh as follows and try again?: rm g6k/*.so `find g6k -name "*.pyc"` -r build

lducas commented 4 years ago

I made a fresh install. Still hanging on ./svp_challenge.py 80 --sieve gauss --pump/down-sieve 1 --verbose --threads 1. Sorry :/

cr-marcstevens commented 4 years ago

Found a serious bug introduced in my rewrite of shrink_db, but this should fix it.

lducas commented 4 years ago

a) Seems fine now. b) Meh. Everything is a trade-off. I think its nicer for average user to kill overhead in moderate dimensions than worrying about performances in super large dimensions. Users of the latter should know enough to adjust the code to their massive experiments. c) I can't help either. Can be the object of a future PR, for example if Elena has too much free time.

So I say all right. You can click the green button.

cr-marcstevens commented 4 years ago

BTW why was this bug not caught by TravisCI?