arhourigan / graveler

2 stars 10 forks source link

Python, but simpler/better #5

Open 31b41a59l26u53 opened 1 month ago

31b41a59l26u53 commented 1 month ago

This is a hopefully more elegant, and somewhat faster version with NumPy.

n Original NumPy
1000,000 3m 17s 9s

Btw, I recommend tqdm progressbars, makes life so much easier!

import numpy as np
from tqdm import trange

best = 0
for rolls in trange(1000000):
    curr = np.sum(np.floor(np.random.rand(231)+0.25))
    if curr > best:
        best = curr
    if best > 177:
        break

print(f"Highest Roll: {best}")
print(f"Number of Roll Sessions: {rolls+1}")

Oneliner version:

import numpy as np

best = np.max(np.sum(np.floor(np.random.rand(1000000,231)+0.25), 1))

print(f"Highest Roll: {best}")
31b41a59l26u53 commented 1 month ago

Multithreading is cheating.

Because the better hardware you got access to, the better results you can get. I can try this on an A100 80GB GPU, so yeah... Pay to win...

Let's cheat!

n Austin Original code NumPy PyTorch A100 80GB Result
1,000 233ms 19ms 1000ms 77
10,000 2s 127ms 705ms 85
100,000 19s 1s 890ms 88
1000,000 ~5m 3m 17s 9s 737ms 91
10,000,000 32m 27s 1m 36s 964ms 96
100,000,000 ~5h 18m 15m 59s 894ms 98
1000,000,000 8d 13h ~2d 7h ~2h 42m 3.9s 99
10,000,000,000 ~23d 3h ~27h 35s 104
100,000,000,000 ~223d ~11d 16h 5m 32s 104
1000,000,000,000 ~6y 122d ~110d 55m 2s 107
10,000,000,000,000 ~62y 278d ~3y 28d 9h 9m 21s 110
import torch
from tqdm import trange

torch.manual_seed(42)

batches = 25
batchsize = 40000000
print(f"Number of batches: {batches}")
print(f"Experiments per batch: {batchsize}")
print(f"Overall planned roll sessions: {batches*batchsize}")

best = 0
for _ in trange(batches):
    curr = torch.max(
        torch.sum(
            torch.floor(
                torch.rand(batchsize, 231, dtype=torch.float32, device=torch.device("cuda")) + 0.25
            ), 1
        )
    )
    if curr > best:
        best = curr
    if best > 177:
        break

print(f"Highest roll: {int(best)}")
kgaughan commented 1 month ago

Hah! I'd been expecting someone to bring NumPy into the mix! Still, I'm rather surprised it's not more dramatically faster than the pure Python version I did, which weighed in roughly 3x-4x slower than your NumPy version. I'm not sure what version of Python you're using, but I might be benefitting from running 3.12, and if you've 3.11, it can be significantly slower.

31b41a59l26u53 commented 3 weeks ago

Hah! I'd been expecting someone to bring NumPy into the mix! Still, I'm rather surprised it's not more dramatically faster than the pure Python version I did, which weighed in roughly 3x-4x slower than your NumPy version.

I am not a NumPy magician, I just implemented something simple quickly. Someone Who knows more about NumPy could probably be much faster (and maybe uglier). Moreover, the speed comparison makes only sense, if using the same hardware. I for example use a server CPU, which on one thread is probably more powerful than an old laptop, but much weaker than a good desktop CPU.

I'm not sure what version of Python you're using, but I might be benefitting from running 3.12, and if you've 3.11, it can be significantly slower.

Hi, I was using Python 3.10 and NumPy 1.26.4 I tried out Python 3.12 and NumPy 2.1.0 and got the same results (within 5%).

Goofierknot5537 commented 3 weeks ago

The NumPy code definitely can be better, at least somewhat. I'm not the best, but I did spend awhile going through the documents looking for what I want. Instead of generating 231 random numbers every time in the loop, we can instead generate every number first, and then go through them in groups of 231.

import numpy as np
from tqdm import tqdm
from datetime import datetime
start = datetime.now()

best = 0

rolls = 1_000_000

array = np.array_split(np.random.randint(1,5, size=(231 * rolls), dtype=np.int8), rolls)

with tqdm(total=rolls) as pbar:
    for array in array:
        curr = np.count_nonzero(array == 1)
        if curr > best:
            best = curr
        pbar.update()
        if best > 177:
            break

print(f"Highest Roll: {best}")
print(f"Number of Roll Sessions: {rolls}")
print(f"Took {datetime.now() - start} seconds to finish.")

This does cause more memory usage, with all 231,000,000 numbers being in a single array, but reduces the time from 4.3s (on my end) to about 2.15s. Practically a 2x speed increase.

ChildishSky2 commented 1 week ago

Probably can make further improvements, but this is wha i got in a few minutes

from tqdm import tqdm
from datetime import datetime
start = datetime.now()

best = 231

rolls = 1_000_000
length_of_run = 231

array = np.random.randint(0, 4, size=(rolls, length_of_run), dtype=np.int8)

with tqdm(total=rolls) as pbar:
    for array in array:
        curr = np.count_nonzero(array)
        if curr < best:
            best = curr
        pbar.update()
        if best < 54:
            break

best = 231 - best

print(f"Highest Roll: {best}")
print(f"Number of Roll Sessions: {rolls}")
print(f"Took {datetime.now() - start} seconds to finish.")

This reduces the time from ~3.1 seconds for 1 milllion to ~1.1 second for a million on my computer