Closed marioroy closed 11 months ago
Branch: bugfixes-2023-08
I tried static as well.
@par(schedule='static', chunk_size=1, ordered=True)
$ codon build -release test.py
$ OMP_NUM_THREADS=1 ./test 100000000 <- orderly :)
1, 15315300
15315301, 30630600
30630601, 45945900
45945901, 61261200
61261201, 76576500
76576501, 91891800
91891801, 100000000
$ OMP_NUM_THREADS=9 ./test 100000000 <- stalled
$ OMP_NUM_THREADS=9 ./test 100000000 <- one line output, stalled
91891801, 100000000
^C
$ OMP_NUM_THREADS=9 ./test 100000000 <- oh, not again...
1, 15315300
45945901, 61261200
76576501, 91891800
^C
$ OMP_NUM_THREADS=7 ./test 100000000 <- not orderly
91891801, 100000000
15315301, 30630600
30630601, 45945900
61261201, 76576500
1, 15315300
45945901, 61261200
76576501, 91891800
Hi @marioroy, this is happening because Codon is not seeing range(1, limit + 1, _SIEVE_SIZE)
as an "imperative" for-loop that it can apply loop-parallelism to, since the step size is a variable. Therefore it's actually using task-based parallelism which is incompatible with "ordered".
We should probably give a warning or something here, but you can fix it by making _SIEVE_SIZE
a compile-time constant: _SIEVE_SIZE: Static[int] = 510510 * 30
. Let me know if that works!
Branch: bugfixes-2023-08
Thank you, @arshajii. Your suggestion is working 100%. I was at a lost here :)
Using my new demo, validating Codon output against C is passing 100% no matter the number of threads specified.
# C
$ OMP_NUM_THREADS=8 ./practicalsieve_mp1 1000000000 | md5sum
50847534 primes found. <- count is sent to STDERR output
92c178cc5bb85e06366551c0ae7e18f6 -
# Codon
$ OMP_NUM_THREADS=8 ./practicalsieve_mp2 1000000000 | md5sum
50847534 primes found.
92c178cc5bb85e06366551c0ae7e18f6 -
I tried also limit 10 billion, generating 4.719 GB of data through the pipe.
# C
$ OMP_NUM_THREADS=8 ./practicalsieve_mp1 1000000000 | md5sum
455052511 primes found.
95ded65c9ddca18e1df966ba2a2b373a -
# Codon
$ OMP_NUM_THREADS=8 ./practicalsieve_mp2 1000000000 | md5sum
455052511 primes found.
95ded65c9ddca18e1df966ba2a2b373a -
I wish for the ability to compute the step size for range. Doing so, changes to task scheduling; losing OpenMP ordered capability. For better performance, big numbers benefit using a bigger step size for range. Not fun is Codon changing to task scheduling behind the scene. Especially, when the step size is computed one time in the code.
So, I had to revert the logic, improving performance for big numbers.
Is there a way to dynamically set the step size for range and not lose OpenMP ordered capability? IMHO, I find this aspect of Codon limiting. Well, it has been great fun experimenting with Codon, until reaching this roadblock and range not supporting 64-bit unsigned integers.
Hi @marioroy -- often you can achieve the same thing with constant step-size by just transforming the index. Example:
for i in range(a, b, c):
print(i)
to
r = range(a, b, c)
for i in range(len(r)):
print(r[i])
Similar to unsigned integers, as the loop can be over int
but you can transform them as needed to u64
in the body. Let me know if this works!
Thank you, @arshajii -- that is most helpful. Computing the step size is beneficial for the parallel prime-sieve application. The performance is restored for the Codon demonstration.
Regarding unsigned integers, I cannot use range. That means, having to create a generator supporting unsigned integers.
r = range(9223372036854775800, 9223372036854775800 + 8, 1)
print(len(r))
Python: 8
Codon: 0
Please consider my vote (+1) for a urange function in Codon.
r = range(a, b, c) # int
r = urange(a, b, c) # u64
r = range128(a, b, c) # Int[128]
r = urange128(a, b, c) # UInt[128]
Thank you, @elisbyberi. Your suggestion works.
for i in range(17446744073000000000, 17446744073000000010):
print(UInt[64](i))
Branch: bugfixes-2023-08
I completed C and Codon parallel segmented practical-sieve demonstrations, but not yet published in Gist due to Codon failing. The following is a mini-example for reproducing the issue.
Testing:
Try
OMP_NUM_THREADS=4
if 3 is working. The stall issue is random.