Add a `Math.Enum.mode` to return the most common element(s) from a collection

Qqwy commented 4 years ago

This issue is based on a request + PR that was made previously: #10 . Since that PR had a few conceptual flaws, I'll open an issue to track this feature instead.

We should add a Math.Enum.mode(collection, equality_function \\ &==/2) function to the library, which returns the most common element.

Important considerations:

It is possible to implement this in linear time complexity (O(n)) by using Enum.reduce and Enum.max_by (or another Enum.reduce if max_by does not suit all our needs).

2, I'd like to support:

Custom comparison functions. (As many functions in the normal Enum module do).
Returning multiple 'most common' elements when there are multiple elements equally common.

If someone wants to pick this up, that's wonderful. If not, I'll work on it myself shortly :slightly_smiling_face: .

bfcarpio commented 4 years ago

Not sure how dedicated I can be to doing this, but I can certainly write some draft code.

How do you envision alternate comparison functions working? For example, let's use >/2. Would that count how many numbers occur that are larger than the current one I'm on?

So with an array of [1, 2, 3]:

1 would have 2 numbers greater than it
2 would have 1 number greater than it
3 would have no numbers greater than it

Since 1 has the most numbers greater than it in the list it is returned.

Qqwy commented 4 years ago

How do you envision alternate comparison functions working?

Ah! The better term to use would be an 'equality checking function'. The most obvious choice would be to choose &===/2 instead of &==/2 to disambugate between 1 (the integer) and 1.0 (the float). It becomes even more important when other types of inputs are used (which might have wholly different structures but should still be considered 'equal' in certain cases).

As for return value: I think it makes the most sense if Math.Enum.mode always returns a list, which is empty when the input collection is empty, and contains all values that were the most common (which can obviously be more than one).

So some hypothetical doctest-like examples:

iex> [1, 2, 3, 4, 1] |> Math.Enum.mode()
[1]
iex> [1, 2, 3, 2, 3] |> Math.Enum.mode()
[2, 3]
iex> [] |> Math.Enum.mode()
[]
iex> [1, 1.0, 2, 3] |> Math.Enum.mode() # NOTE: In this case which format will be picked is not guaranteed
[1]
iex> [1, 1.0, 2, 3] |> Math.Enum.mode(&===/1)
[1, 1.0]

Does that make sense? :slightly_smiling_face:

bfcarpio commented 4 years ago

Makes sense!

I've got a quick and dirty implementation that's O(n) on my fork, but I'm working on your idea of different comparison operators.

Ran into an issue though from the official docs:

[In Maps] keys are compared using the exact-equality operator (===/2).

Unless you want to roll your own collectable we might not be able to cleanly implement the 'equality checking function'...

EDIT: My working branch #15

Qqwy commented 4 years ago

Unless you want to roll your own collectable we might not be able to cleanly implement the 'equality checking function'...

Unless I'm misunderstanding something, the fact that a difference exists between e.g. == and === means that it would be nice to give users the choice between which equality function to use.

EDIT: Ah, I see the problem you are referring to: The fact that you are using a map inside the Enum.reduce and (have to) rely on map key equality there. Yes, this indeed is a bit of a problem. :thinking:

bfcarpio commented 4 years ago

I agree, however the Map collectable that comes in the Elixir standard library by default uses ===/2.

There's no way for me to specify that it use ==/2. The only way I can think that we get the functionality we need is a custom collectable (hard), or converting decimals to integers (potentially bad).

Feel free to make comments on the draft merge request if that would help.

On Mon, Jan 20, 2020, 11:03 AM Qqwy notifications@github.com wrote:

Unless you want to roll your own collectable we might not be able to cleanly implement the 'equality checking function'...

Unless I'm misunderstanding something, the fact that a difference exists between e.g. == and === means that it would be nice to give users the choice between which equality function to use.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/folz/math/issues/13?email_source=notifications&email_token=ALUI4XZUS4SX7436INVE5Y3Q6XDORA5CNFSM4KI3PHT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJNDKTI#issuecomment-576337229, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALUI4X5BC2LLODNMYBS33MDQ6XDORANCNFSM4KI3PHTQ .

Qqwy commented 4 years ago

After thinking some more, let's go ahead with the current PR as-is. We can always expand it later to expand its functionality.

It might very well be the case that non-structural comparisons turn the algorithm back into a linearithmic (O(n * log(n))) algorithm because replacing the map by e.g. a red-black tree would reduce O(1)-inserts to O(log(n)) again.

folz / math

Add a `Math.Enum.mode` to return the most common element(s) from a collection #13