Closed Sirius902 closed 3 years ago
What kind of GPU are you running this on? This program requires dozens of GiB of memory.
You should also read this, which will make your code faster anyway.
Thank you for taking a look! I should probably clarify: I am running this on an NVIDIA RTX 2080 Ti and I don't intend on actually returning the length of the filtered list, I was only doing that for testing. I intend on serializing and returning the list of potential seeds to the caller. I have noticed that returning the head of the list instead of the length also results in the same 0xCDCDCDCD
. Do you think I could be hitting the memory limit on my GPU?
An RTX 2080 Ti does not have enough memory to run this program, and I'm surprised you don't get an out-of-memory crash. I do on my RTX 2080 Ti.
Instead of computing all 2**32
values immediately, I suggest you do multiple passes of fewer elements, and concatenate the results:
let crack_xp_seed: u32 =
let bookshelves = 15
let levels = [9, 20, 30]
let log_num_chunks = 3
let num_chunks = 1<<log_num_chunks
let chunk_size = 0x100000000 >> log_num_chunks
let res =
loop acc = [] for i < num_chunks do
acc ++
(iota chunk_size
|> map (+(chunk_size*i))
|> map u32.i64
|> filter (enchant_levels bookshelves >-> (==levels)))
in u32.i64 (length res)
You'll still need to have enough memory to fit all the results on the GPU, of course, but that looks like an array with less than five million elements.
Thanks! I tried out your suggestion and managed to get it to work with log_num_chunks = 4
.
Hello, when trying to run this code I noticed some oddities with OpenCL. When compiling with
futhark c
, I get the expected result of0x0047FF53
but when compiling withfuthark opencl
I get0xCDCDCDCD
instead.