gleam-lang / stdlib

🎁 Gleam's standard library
https://hexdocs.pm/gleam_stdlib/
Apache License 2.0
454 stars 163 forks source link

Function to take a number of random values from a list #683

Open sobolevn opened 3 weeks ago

sobolevn commented 3 weeks ago

I started looking at the state of random in Gleam and I have several ideas.

  1. int.random and float.random do not have seeds. It might be a problem for libraries. For example: if you want a fake-data library for tests: you need to have the same data based on some test seed value. Otherwise, you won't have reproducable failures. https://docs.python.org/3/library/random.html#notes-on-reproducibility
  2. There's no int.random_range(x, y) function, which would generate an int from range >=x, <y. I think that this function is essential
  3. There's no choice(List) function, which in my experience is the second most frequently used random function in Python.

Maybe we should add random module with the needed functions and a nice api for Seed(Int)?

giacomocavalieri commented 3 weeks ago

For that there's the prng package https://hex.pm/packages/prng

inoas commented 3 weeks ago

Maybe prng could have random_range and choice added?

lpil commented 3 weeks ago

No plans for 1 and 2 at present. 3 sounds good 👍

giacomocavalieri commented 3 weeks ago

It already has both

inoas commented 3 weeks ago

hm the gleam-@alias idea reappears for me where we could document and help people find functions when they come from other stdlibs such as pythons, phps and javascripts.

Varpie commented 3 weeks ago

Isn't this basically list.shuffle |> list.take(n)?

I suppose this can be made more efficient than shuffling the full list, if we have a large list and only want to pick a few random elements, but it could be a good first version.

shuffle runs a fold generating random numbers for each element of a list, then sort it and iterates over all elements to remove the generated random numbers. take runs in linear time.
A possible improvement would be to iterate over only the first n elements after the sort, which would avoid one full iteration of the list.

A possible version without the sort would generate a new random int every time, that is at most the length of the list, then pop the element at that position into an accumulator and repeat until it has n elements (or the list is empty).

lpil commented 3 weeks ago

There are algorithms for random sampling from a linked list. I haven't done the research to say how each approach compares.

apainintheneck commented 1 week ago

Reservoir sampling would be the obvious choice but the implementations I'm familiar with use indexing into arrays which wouldn't work here since it would have to work with lists instead. I guess you could use a dictionary instead of an array as the reservoir but there's probably a better approach out there.

lpil commented 4 days ago

Oops, misclick.

How about we copy whatever Elixir or Elm or some other similar language does?

apainintheneck commented 14 hours ago

Good point. It's worth taking a look at what other languages do in this area.

The Elixir standard library has the Enum.take_random method which now uses a modified reservoir sampling algorithm for performance reasons (relevant commit). Internally it uses a tuple as the reservoir in place of the traditional fixed length array.

I looked at the Elm and couldn't find any relevant methods in the standard library or packages.

A few Haskell libraries had implemented versions of it which used things like IntMap as the reservoir data structure.

None of this really helps us but it's still interesting.