gleam-lang / stdlib

🎁 Gleam's standard library
https://hexdocs.pm/gleam_stdlib/
Apache License 2.0
494 stars 175 forks source link

Function to take a number of random values from a list #683

Open sobolevn opened 3 months ago

sobolevn commented 3 months ago

I started looking at the state of random in Gleam and I have several ideas.

  1. int.random and float.random do not have seeds. It might be a problem for libraries. For example: if you want a fake-data library for tests: you need to have the same data based on some test seed value. Otherwise, you won't have reproducable failures. https://docs.python.org/3/library/random.html#notes-on-reproducibility
  2. There's no int.random_range(x, y) function, which would generate an int from range >=x, <y. I think that this function is essential
  3. There's no choice(List) function, which in my experience is the second most frequently used random function in Python.

Maybe we should add random module with the needed functions and a nice api for Seed(Int)?

giacomocavalieri commented 3 months ago

For that there's the prng package https://hex.pm/packages/prng

inoas commented 3 months ago

Maybe prng could have random_range and choice added?

lpil commented 3 months ago

No plans for 1 and 2 at present. 3 sounds good 👍

giacomocavalieri commented 3 months ago

It already has both

inoas commented 3 months ago

hm the gleam-@alias idea reappears for me where we could document and help people find functions when they come from other stdlibs such as pythons, phps and javascripts.

Varpie commented 3 months ago

Isn't this basically list.shuffle |> list.take(n)?

I suppose this can be made more efficient than shuffling the full list, if we have a large list and only want to pick a few random elements, but it could be a good first version.

shuffle runs a fold generating random numbers for each element of a list, then sort it and iterates over all elements to remove the generated random numbers. take runs in linear time.
A possible improvement would be to iterate over only the first n elements after the sort, which would avoid one full iteration of the list.

A possible version without the sort would generate a new random int every time, that is at most the length of the list, then pop the element at that position into an accumulator and repeat until it has n elements (or the list is empty).

lpil commented 3 months ago

There are algorithms for random sampling from a linked list. I haven't done the research to say how each approach compares.

apainintheneck commented 2 months ago

Reservoir sampling would be the obvious choice but the implementations I'm familiar with use indexing into arrays which wouldn't work here since it would have to work with lists instead. I guess you could use a dictionary instead of an array as the reservoir but there's probably a better approach out there.

lpil commented 2 months ago

Oops, misclick.

How about we copy whatever Elixir or Elm or some other similar language does?

apainintheneck commented 2 months ago

Good point. It's worth taking a look at what other languages do in this area.

The Elixir standard library has the Enum.take_random method which now uses a modified reservoir sampling algorithm for performance reasons (relevant commit). Internally it uses a tuple as the reservoir in place of the traditional fixed length array.

I looked at the Elm and couldn't find any relevant methods in the standard library or packages.

A few Haskell libraries had implemented versions of it which used things like IntMap as the reservoir data structure.

None of this really helps us but it's still interesting.

ethanthoma commented 4 days ago

Are y'all wanting a multiple sample like python's random.sample(List, Int) or just a single element sample like random.choice(List)?

lpil commented 4 days ago

We want a function which takes a specified number of values from a list.

ethanthoma commented 3 days ago

I was looking at impls, it seems like algo L seems optimal (and i could be wrong but seems similar to what elixer does). However, it requires taking the natural log which I couldnt find impl in the stdlib, whats the ideal solution? A slower impl without needing natural log or adding natural log to gleam/float?

lpil commented 3 days ago

Sounds like a good reason to add that function to me! Unless anyone else has any other suggestions.

It looks like that algorithm wants array mutation at a random index. How would you do that in Gleam given we don't have constant time indexing or array mutation.