autonomio / chances

Package that implements several pseudo, quasi, and true, including quantum random methods into a unified single-line command interface.
MIT License
6 stars 2 forks source link

uniform_mersenne is using memory proportional to the range of numbers it generates #2

Open JohanMollevik opened 5 years ago

JohanMollevik commented 5 years ago

uniform_mersenne is using memory proportional to the range of numbers it generates

If asked to generate 3 numbers between 0 and 1000000000 the memmory allocated will be for an array 1000000000 big.

the culprint is the line

out = list(range(self.len))

which happily allocates gigabytes of memmory even when only a few numbers are requested

mikkokotila commented 5 years ago

Good catch. That's a very poor way to do this. I'm replacing it with:

return np.random.randint(0, self.len, self.n).tolist()

This supports cases until 10^20 at which point int64 overflows. So I guess it's nice if we can say we support permutation spaces until 10^20.

JohanMollevik commented 5 years ago

It seems the implementation could be replaced by using pythons built in random generation rather than numpys as python has optimized its implementation for ranges

import random ... return random.sample(range(self.len), k=self.n)

Are there ay problems with making a change like that?

JohanMollevik commented 5 years ago

Does the randint allows duplicates or not?

mikkokotila commented 5 years ago

It indeed does. So let's use what you have proposed.

EDIT: correction to before >> here the constrain with overflow is at 10^18 so that's the supported limit.

JohanMollevik commented 5 years ago

That will have to do

JohanMollevik commented 5 years ago

If 10^18 proves limiting it would probably work well to add a fallback where if the requested number is bigger than that it selects random numbers in a loop until it has enough without duplicates.

The neat thing is that in this case with huge numbers the risk of selecting the same number several times in a row is vanishingly small as the range of the numbers are huge compared to the number of numbers requested.

This will make the fallback take linear time in the common case instead of running the risk of not terminating which that approach can do with large fill factors.

JohanMollevik commented 5 years ago

This limit ended up being a problem for me, pull request https://github.com/autonomio/chances/pull/3

JohanMollevik commented 5 years ago

The CI failed, it seems to be complaining about unit test coverage, I will try to resolve that.

JohanMollevik commented 5 years ago

CI still fails but the error seems unrelated to this and looks more like a dependency problem, can you look and see if you see a cause?