Closed stonebig closed 6 months ago
on windows10, feeding the hardest.txt with 500 lines of "8..........36......7..9.2...5...7.......457.....1...3...1....68..85...1..9....4..":
made a no-install version for the believers, auto-testing on 1, 2, 4, 8, 16 threads
hard2 = '8..........36......7..9.2...5...7.......457.....1...3...1....68..85...1..9....4..'
if __name__ == '__main__':
test()
nbsudoku = 20
thread_list = ( 1, 2, 4, 8, 16)
for nbthreads in thread_list:
startall = time.time()
solve_all([hard2]*nbsudoku, "hard2", None, nbthreads=nbthreads)
print(f'solved {nbsudoku} tests with {nbthreads} threads in {time.time()-startall:.2f} seconds' + '\n')
On a personal test on a old intel cpu 2cpu/4threads Windows 10, with nogil python of 2022-12-22:
So on Windows, with nogil python of 2022-12-22:
but the current mono-thread penalty seems about 50% versus standard Python, on Windows.
MD5 | SHA-1 | SHA-256 | Binary | Size | SHA3-256 |
---|---|---|---|---|---|
a8a7c8a81625528aa74e4aaa068daf7f | a922945e129e5579c8bdb42636cce021fdbfb5a1 | 2d281ff5a21834814cfc70a254244c726e082e72d2ca6b523349fe20ff8f9a01 | Winpython64-3.9.10.1dotnogil_a1.exe | 26 287 370 Bytes | 7c82cf1405f8e901fadeacf7ea622319b03ba7596db6333ecf477b2041643631 |
I see Peter Norvig did a Jupyterlab version, with more modern Python coding and Typing
I'll look if it changes things https://colab.research.google.com/github/norvig/pytudes/blob/main/ipynb/Sudoku.ipynb#scrollTo=W8iYmI_560L4
On a windows 11 4/4 cpu, nogil looks 1.8x faster than standard
did modified to get more easy to read results:
nogil
All tests pass.
On a 8 cpu 3.9.10 (heads/nogil:258cab1, Dec 22 2022, 04:07:29)
[nogil, [MSC v.1929 64 bit (AMD64)]]:
Solved 330 of 330 hardest puzzles (avg 0.01 secs (174 Hz), max 0.02 secs).
solved 330 tests with 1 threads in 1.97 seconds, 1.00 speed-up
Solved 330 of 330 hardest puzzles (avg 0.01 secs (146 Hz), max 0.02 secs).
solved 330 tests with 2 threads in 1.16 seconds, 1.70 speed-up
Solved 330 of 330 hardest puzzles (avg 0.01 secs (113 Hz), max 0.03 secs).
solved 330 tests with 4 threads in 0.75 seconds, 2.62 speed-up
Solved 330 of 330 hardest puzzles (avg 0.02 secs (65 Hz), max 0.05 secs).
solved 330 tests with 8 threads in 0.66 seconds, 2.99 speed-up
Solved 330 of 330 hardest puzzles (avg 0.03 secs (34 Hz), max 0.09 secs).
solved 330 tests with 16 threads in 0.66 seconds, 2.99 speed-up
normal python-3.9
All tests pass.
On a 8 cpu 3.9.8 (tags/v3.9.8:bb3fdcf, Nov 5 2021, 20:48:33) [MSC v.1929 64 bit (AMD64)]:
Solved 330 of 330 hardest puzzles (avg 0.00 secs (314 Hz), max 0.02 secs).
solved 330 tests with 1 threads in 1.16 seconds, 1.00 speed-up
Solved 330 of 330 hardest puzzles (avg 0.01 secs (147 Hz), max 0.09 secs).
solved 330 tests with 2 threads in 1.16 seconds, 1.00 speed-up
Solved 330 of 330 hardest puzzles (avg 0.01 secs (77 Hz), max 0.27 secs).
solved 330 tests with 4 threads in 1.16 seconds, 1.00 speed-up
Solved 330 of 330 hardest puzzles (avg 0.02 secs (47 Hz), max 0.55 secs).
solved 330 tests with 8 threads in 1.19 seconds, 0.97 speed-up
Solved 330 of 330 hardest puzzles (avg 0.02 secs (54 Hz), max 0.57 secs).
solved 330 tests with 16 threads in 1.18 seconds, 0.99 speed-up
python-3.12a03 (notice the small speed-up vs 3.9 on this apparently worst-case example)
All tests pass.
On a 8 cpu 3.12.0a3 (tags/v3.12.0a3:b6bd7ff, Dec 6 2022, 20:19:00) [MSC v.1934 64 bit (AMD64)]:
Solved 330 of 330 hardest puzzles (avg 0.00 secs (305 Hz), max 0.02 secs).
solved 330 tests with 1 threads in 1.13 seconds, 1.00 speed-up
Solved 330 of 330 hardest puzzles (avg 0.01 secs (156 Hz), max 0.08 secs).
solved 330 tests with 2 threads in 1.11 seconds, 1.01 speed-up
Solved 330 of 330 hardest puzzles (avg 0.01 secs (84 Hz), max 0.24 secs).
solved 330 tests with 4 threads in 1.13 seconds, 1.00 speed-up
Solved 330 of 330 hardest puzzles (avg 0.02 secs (49 Hz), max 0.36 secs).
solved 330 tests with 8 threads in 1.11 seconds, 1.01 speed-up
Solved 330 of 330 hardest puzzles (avg 0.02 secs (62 Hz), max 0.36 secs).
solved 330 tests with 16 threads in 1.12 seconds, 1.01 speed-up
testing on Python-3.13.0b1 'free-threaded' vs 'normal' Python. (replacing 20 per 80 to have a better timing)
testing python-3.13.0b1 'free-threaded' sudoku on a Windows 4C/8T cpu:
C:\WinP\bdDocs\docs.nogil>python C:\WinP\bdDocs\docs.nogil\sudoku_thread_perf_comparison_typed.py
Solved 80 of 80 hard2 puzzles (avg 0.02 secs (47 Hz), max 0.03 secs).
solved 80 tests with 1 threads in 1.68 seconds
Solved 80 of 80 hard2 puzzles (avg 0.04 secs (24 Hz), max 0.10 secs).
solved 80 tests with 2 threads in 1.67 seconds
Solved 80 of 80 hard2 puzzles (avg 0.08 secs (12 Hz), max 0.24 secs).
solved 80 tests with 4 threads in 1.67 seconds
Solved 80 of 80 hard2 puzzles (avg 0.14 secs (7 Hz), max 0.52 secs).
solved 80 tests with 8 threads in 1.70 seconds
Solved 80 of 80 hard2 puzzles (avg 0.06 secs (15 Hz), max 0.28 secs).
solved 80 tests with 16 threads in 1.68 seconds
C:\WinP\bdDocs\docs.nogil>python3.13t.exe C:\WinP\bdDocs\docs.nogil\sudoku_thread_perf_comparison_typed.py
Solved 80 of 80 hard2 puzzles (avg 0.03 secs (29 Hz), max 0.04 secs).
solved 80 tests with 1 threads in 2.71 seconds
Solved 80 of 80 hard2 puzzles (avg 0.04 secs (25 Hz), max 0.05 secs).
solved 80 tests with 2 threads in 1.59 seconds
Solved 80 of 80 hard2 puzzles (avg 0.05 secs (18 Hz), max 0.07 secs).
solved 80 tests with 4 threads in 1.09 seconds
Solved 80 of 80 hard2 puzzles (avg 0.09 secs (10 Hz), max 0.11 secs).
solved 80 tests with 8 threads in 0.94 seconds
Solved 80 of 80 hard2 puzzles (avg 0.18 secs (5 Hz), max 0.25 secs).
solved 80 tests with 16 threads in 0.95 seconds
Hi,
Parallelising N sudoku is an obvious option for nogil.
But would it be possible to make one sudoku faster with an also simple change ?
I tried this, but my threading skill/muscle seems not to get it the right: I blew up my swap space in no time, needing to reboot.
trying with