Python, but simpler/better

31b41a59l26u53 commented 1 month ago

This is a hopefully more elegant, and somewhat faster version with NumPy.

n	Original	NumPy
1000,000	3m 17s	9s

Btw, I recommend tqdm progressbars, makes life so much easier!

import numpy as np
from tqdm import trange

best = 0
for rolls in trange(1000000):
    curr = np.sum(np.floor(np.random.rand(231)+0.25))
    if curr > best:
        best = curr
    if best > 177:
        break

print(f"Highest Roll: {best}")
print(f"Number of Roll Sessions: {rolls+1}")

Oneliner version:

import numpy as np

best = np.max(np.sum(np.floor(np.random.rand(1000000,231)+0.25), 1))

print(f"Highest Roll: {best}")

31b41a59l26u53 commented 1 month ago

Multithreading is cheating.

Because the better hardware you got access to, the better results you can get. I can try this on an A100 80GB GPU, so yeah... Pay to win...

Let's cheat!

n	Austin	Original code	NumPy	PyTorch A100 80GB	Result
1,000		233ms	19ms	1000ms	77
10,000		2s	127ms	705ms	85
100,000		19s	1s	890ms	88
1000,000	~5m	3m 17s	9s	737ms	91
10,000,000		32m 27s	1m 36s	964ms	96
100,000,000		~5h 18m	15m 59s	894ms	98
1000,000,000	8d 13h	~2d 7h	~2h 42m	3.9s	99
10,000,000,000		~23d 3h	~27h	35s	104
100,000,000,000		~223d	~11d 16h	5m 32s	104
1000,000,000,000		~6y 122d	~110d	55m 2s	107
10,000,000,000,000		~62y 278d	~3y 28d	9h 9m 21s	110

import torch
from tqdm import trange

torch.manual_seed(42)

batches = 25
batchsize = 40000000
print(f"Number of batches: {batches}")
print(f"Experiments per batch: {batchsize}")
print(f"Overall planned roll sessions: {batches*batchsize}")

best = 0
for _ in trange(batches):
    curr = torch.max(
        torch.sum(
            torch.floor(
                torch.rand(batchsize, 231, dtype=torch.float32, device=torch.device("cuda")) + 0.25
            ), 1
        )
    )
    if curr > best:
        best = curr
    if best > 177:
        break

print(f"Highest roll: {int(best)}")

kgaughan commented 1 month ago

Hah! I'd been expecting someone to bring NumPy into the mix! Still, I'm rather surprised it's not more dramatically faster than the pure Python version I did, which weighed in roughly 3x-4x slower than your NumPy version. I'm not sure what version of Python you're using, but I might be benefitting from running 3.12, and if you've 3.11, it can be significantly slower.

31b41a59l26u53 commented 3 weeks ago

Hah! I'd been expecting someone to bring NumPy into the mix! Still, I'm rather surprised it's not more dramatically faster than the pure Python version I did, which weighed in roughly 3x-4x slower than your NumPy version.

I am not a NumPy magician, I just implemented something simple quickly. Someone Who knows more about NumPy could probably be much faster (and maybe uglier). Moreover, the speed comparison makes only sense, if using the same hardware. I for example use a server CPU, which on one thread is probably more powerful than an old laptop, but much weaker than a good desktop CPU.

I'm not sure what version of Python you're using, but I might be benefitting from running 3.12, and if you've 3.11, it can be significantly slower.

Hi, I was using Python 3.10 and NumPy 1.26.4 I tried out Python 3.12 and NumPy 2.1.0 and got the same results (within 5%).

Goofierknot5537 commented 3 weeks ago

The NumPy code definitely can be better, at least somewhat. I'm not the best, but I did spend awhile going through the documents looking for what I want. Instead of generating 231 random numbers every time in the loop, we can instead generate every number first, and then go through them in groups of 231.

import numpy as np
from tqdm import tqdm
from datetime import datetime
start = datetime.now()

best = 0

rolls = 1_000_000

array = np.array_split(np.random.randint(1,5, size=(231 * rolls), dtype=np.int8), rolls)

with tqdm(total=rolls) as pbar:
    for array in array:
        curr = np.count_nonzero(array == 1)
        if curr > best:
            best = curr
        pbar.update()
        if best > 177:
            break

print(f"Highest Roll: {best}")
print(f"Number of Roll Sessions: {rolls}")
print(f"Took {datetime.now() - start} seconds to finish.")

This does cause more memory usage, with all 231,000,000 numbers being in a single array, but reduces the time from 4.3s (on my end) to about 2.15s. Practically a 2x speed increase.

ChildishSky2 commented 1 week ago

Probably can make further improvements, but this is wha i got in a few minutes

from tqdm import tqdm
from datetime import datetime
start = datetime.now()

best = 231

rolls = 1_000_000
length_of_run = 231

array = np.random.randint(0, 4, size=(rolls, length_of_run), dtype=np.int8)

with tqdm(total=rolls) as pbar:
    for array in array:
        curr = np.count_nonzero(array)
        if curr < best:
            best = curr
        pbar.update()
        if best < 54:
            break

best = 231 - best

print(f"Highest Roll: {best}")
print(f"Number of Roll Sessions: {rolls}")
print(f"Took {datetime.now() - start} seconds to finish.")

This reduces the time from ~3.1 seconds for 1 milllion to ~1.1 second for a million on my computer

arhourigan / graveler