Adding a generator for partitions of a set

Hipparchus-Math / hipparchus

An efficient, general-purpose mathematics components library in the Java programming language

Apache License 2.0

142 stars 41 forks source link

Adding a generator for partitions of a set #197

Closed maisonobe closed 2 years ago

maisonobe commented 2 years ago

As part of implementing #191, I came across a need to generate some partitions of a set of derivatives, like for example (∂²pᵢ/∂qₗ∂qₘ, ∂pj/∂qₙ), (∂²pᵢ/∂qₘ∂qₙ, ∂pj/∂qₗ), (∂²pᵢ/∂qₘ∂qₗ, ∂pj/∂qₘ) which correspond to all ways to partition the set of three variables qₗ, qₘ, qₙ into two sets.

A generic way to generate arbitrary partitions from a list of objects would be nice.

axkr commented 2 years ago

I think using the rosen iterator as a basis should be faster:

https://github.com/axkr/symja_android_library/blob/master/symja_android_library/matheclipse-core/src/main/java/org/matheclipse/core/combinatoric/RosenNumberPartitionIterator.java

Often we don't need the complete partition as a full list for example in "pattern matching". A "step visitor pattern" can decide, if we stop at a "matching iteration":

https://github.com/axkr/symja_android_library/blob/master/symja_android_library/matheclipse-core/src/main/java/org/matheclipse/core/combinatoric/RosenNumberPartitionIterator.java

maisonobe commented 2 years ago

You are right, this is exactly what I needed!

I was going to refactor the algorithm to use a Stream API which would have allowed to filter the partitions upon generation.

Would you allow me to copy this code and put it in Hipparchus by changing its license from GPLV3 to Apache 2?

axkr commented 2 years ago

Would you allow me to copy this code and put it in Hipparchus by changing its license from GPLV3 to Apache 2?

Yes, no problem

maisonobe commented 2 years ago

Upon testing, it doesn't seem to be equivalent to my needs, so I will keep both implementations. The Rosen algorithm gives the count, so for example RosenNumberPartitionIterator(4, 2) generates the three arrays:

[1, 3]
[2, 2]
[3, 1]

However, there are several ways to distribute the elements within these arrays. What I needed was an algorithm that would produce the 7 partitions, not just count their number of elements:

{  1,  2,  3 }, {  4 }
{  1,  2,  4 }, {  3 }
{  1,  2 }, {  3,  4 }
{  1,  3,  4 }, {  2 }
{  1,  3 }, {  2,  4 }
{  1,  4 }, {  2,  3 }
{  1 }, {  2,  3,  4 }

maisonobe commented 2 years ago

I have set up the two implementations as they provide two different and complementary results.

The implementation based on B. Djokić algorithm has been changed to a Stream API. It allows filtering the partitions during generation, typically to select only the ones with a specific number of parts, which is the main use case. This change also avoids huge memory consumption as the number of partitions grows up fast (there are more than 10 billions of partitions of a 16 elements set). On my computer, I was not able to go past 12 elements with the version that stored everything, but with the Stream version I could generate the 10 billions partitions for 16 elements (it took about 20 minutes on a desktop computer with an AMD Ryzen 5 3600 processor and 16GB RAM running Linux).

I have also added the computation of Bell numbers, as I used them in the validation tests.

axkr commented 2 years ago

If I'm using "multiple equal objects", i.e.

Arrays.asList(1, 2, 2, 4, 4)).collect(Collectors.toList()

I get "duplicate" entries:

{1,2,2,4,4}
{1,2,2,4}{4}
{1,2,2,4}{4}
{1,2,2}{4,4}
...

Is it possible to restrict the partitions iterator to the same number of partitions?

maisonobe commented 2 years ago

I don't think it is possible without storing all partitions and filtering duplicates afterwards. The reason is twofold:

the partitions are generated one after the other (in small batches) and not related at all to each other
the algorithms does not look at the elements themselves, it just put them in lists depending on their index This means that the partitioning has no idea of "equals" between elements when it creates the partitions.

One way to do it is post-processing the entries and filtering out.

axkr commented 2 years ago

Just tested with MMA.

The algorithm also seems to take the same combinations again for pattern-matching, if the elements are equal.

SetAttributes[h, {Flat,Orderless}]

ReplaceList[h[a,b,b,d],h[x_, y_,z_ ] :> {x, y ,z}]

But for symbols like Plus and Times (which are the typical examples for Flat, Orderless functions) this is no problem, because argument evaluation of b+b gives 2*b and b*b gives b^2 as "new element" in the evaluated list.

So I think it's OK to try the same combination again in the iterator for this case.