Open 31b41a59l26u53 opened 3 months ago
Because the better hardware you got access to, the better results you can get. I can try this on an A100 80GB GPU, so yeah... Pay to win...
n | Austin | Original code | NumPy | PyTorch A100 80GB | Result |
---|---|---|---|---|---|
1,000 | 233ms | 19ms | 1000ms | 77 | |
10,000 | 2s | 127ms | 705ms | 85 | |
100,000 | 19s | 1s | 890ms | 88 | |
1000,000 | ~5m | 3m 17s | 9s | 737ms | 91 |
10,000,000 | 32m 27s | 1m 36s | 964ms | 96 | |
100,000,000 | ~5h 18m | 15m 59s | 894ms | 98 | |
1000,000,000 | 8d 13h | ~2d 7h | ~2h 42m | 3.9s | 99 |
10,000,000,000 | ~23d 3h | ~27h | 35s | 104 | |
100,000,000,000 | ~223d | ~11d 16h | 5m 32s | 104 | |
1000,000,000,000 | ~6y 122d | ~110d | 55m 2s | 107 | |
10,000,000,000,000 | ~62y 278d | ~3y 28d | 9h 9m 21s | 110 |
import torch
from tqdm import trange
torch.manual_seed(42)
batches = 25
batchsize = 40000000
print(f"Number of batches: {batches}")
print(f"Experiments per batch: {batchsize}")
print(f"Overall planned roll sessions: {batches*batchsize}")
best = 0
for _ in trange(batches):
curr = torch.max(
torch.sum(
torch.floor(
torch.rand(batchsize, 231, dtype=torch.float32, device=torch.device("cuda")) + 0.25
), 1
)
)
if curr > best:
best = curr
if best > 177:
break
print(f"Highest roll: {int(best)}")
Hah! I'd been expecting someone to bring NumPy into the mix! Still, I'm rather surprised it's not more dramatically faster than the pure Python version I did, which weighed in roughly 3x-4x slower than your NumPy version. I'm not sure what version of Python you're using, but I might be benefitting from running 3.12, and if you've 3.11, it can be significantly slower.
Hah! I'd been expecting someone to bring NumPy into the mix! Still, I'm rather surprised it's not more dramatically faster than the pure Python version I did, which weighed in roughly 3x-4x slower than your NumPy version.
I am not a NumPy magician, I just implemented something simple quickly. Someone Who knows more about NumPy could probably be much faster (and maybe uglier). Moreover, the speed comparison makes only sense, if using the same hardware. I for example use a server CPU, which on one thread is probably more powerful than an old laptop, but much weaker than a good desktop CPU.
I'm not sure what version of Python you're using, but I might be benefitting from running 3.12, and if you've 3.11, it can be significantly slower.
Hi, I was using Python 3.10 and NumPy 1.26.4 I tried out Python 3.12 and NumPy 2.1.0 and got the same results (within 5%).
The NumPy code definitely can be better, at least somewhat. I'm not the best, but I did spend awhile going through the documents looking for what I want. Instead of generating 231 random numbers every time in the loop, we can instead generate every number first, and then go through them in groups of 231.
import numpy as np
from tqdm import tqdm
from datetime import datetime
start = datetime.now()
best = 0
rolls = 1_000_000
array = np.array_split(np.random.randint(1,5, size=(231 * rolls), dtype=np.int8), rolls)
with tqdm(total=rolls) as pbar:
for array in array:
curr = np.count_nonzero(array == 1)
if curr > best:
best = curr
pbar.update()
if best > 177:
break
print(f"Highest Roll: {best}")
print(f"Number of Roll Sessions: {rolls}")
print(f"Took {datetime.now() - start} seconds to finish.")
This does cause more memory usage, with all 231,000,000 numbers being in a single array, but reduces the time from 4.3s (on my end) to about 2.15s. Practically a 2x speed increase.
Probably can make further improvements, but this is wha i got in a few minutes
from tqdm import tqdm
from datetime import datetime
start = datetime.now()
best = 231
rolls = 1_000_000
length_of_run = 231
array = np.random.randint(0, 4, size=(rolls, length_of_run), dtype=np.int8)
with tqdm(total=rolls) as pbar:
for array in array:
curr = np.count_nonzero(array)
if curr < best:
best = curr
pbar.update()
if best < 54:
break
best = 231 - best
print(f"Highest Roll: {best}")
print(f"Number of Roll Sessions: {rolls}")
print(f"Took {datetime.now() - start} seconds to finish.")
This reduces the time from ~3.1 seconds for 1 milllion to ~1.1 second for a million on my computer
This is a hopefully more elegant, and somewhat faster version with NumPy.
Btw, I recommend tqdm progressbars, makes life so much easier!
Oneliner version: