Macaulay2 / M2

The primary source code repository for Macaulay2, a system for computing in commutative algebra, algebraic geometry and related fields.
https://macaulay2.com
345 stars 230 forks source link

Sets don't sort consistently #650

Open kroner opened 7 years ago

kroner commented 7 years ago
i1 : L = {set{1,3},set{0},set{2}}
o1 = {set {1, 3}, set {0}, set {2}}
o1 : List
i2 : L = sort L
o2 = {set {2}, set {0}, set {1, 3}}
o2 : List
i3 : L = sort L
o3 = {set {1, 3}, set {0}, set {2}}
o3 : List
i4 : L = sort L
o4 = {set {2}, set {0}, set {1, 3}}
o4 : List

Not really sure what causes this, but repeatedly sorting a list of Sets appears to alternate between two orders.

aarondall commented 6 years ago

The sort method uses <= to compare sets by inclusion, so it's technically doing the right thing. As the sets in the example list above are incomparable with respect to inclusion, it simply lists them in the reverse order of how they are given.

Encoding the sets as monomials in a polynomial ring allows for proper sorting since polynomial rings in M2 have term orders:

i1 : R = kk[x_0..x_3]; -- uses GRevLex by default
i2 : L = {R_1*R_3, R_0, R_2};
i3 : toString L
o3 = {x_1*x_3, x_0, x_2}
i4 : toString sort L 
o4 = {x_2, x_0, x_1*x_3} 
i5 : toString sort sort L
o5 = {x_2, x_0, x_1*x_3}
kroner commented 6 years ago

I think if sort is to be defined for a class, then the class should have a total order underlying the sort (even if "ties" are broken in some opaque way). The efficient way to determine if two lists contain the same multi-set of elements is to sort them, but this only works with a total order.

Given that sets are sortable, I assumed that sorting would be idempotent. The fact that it isn't broke my code in an unexpected way. Maybe this was an unreasonable assumption?

DanGrayson commented 6 years ago

I'm not sure what you mean by "sets are sortable".

In any case, what's true for sets is that the comparison operator X ? Y is implemented, but comparison is implemented as inclusion, which is not a total ordering, so one does not expect sort to be useful for sets, unless the sets one have form a chain.

I don't know what makes "sort" not idempotent -- that would be good to fix.

Maybe the real question is : is there a way to implement a total ordering on all the objects of the system? Notice that it has to be invariant under strict equality ( === ). If there were a way to do that, we could introduce a new primitive operator for doing it. Ideally it would be a speedy low level routine, part of the interpreter, with no possibility of the user adding methods for comparison.

aarondall commented 6 years ago

The efficient way to determine if two lists contain the same multi-set of elements is to sort them, but this only works with a total order.

Using "sort" to check that two lists, X and Y, are equal as multisets is not efficient when X and Y are large. It is far more efficient to convert the lists to tallies and compare them as such.

i1 : X = flatten apply (keys ((set {1,2,3})^**5), k -> toList k);
i2 : Y = reverse X;
i3 : time tally X === tally Y
     -- used 0.000033 seconds
o3 = true
i4 : time sort X === sort Y
     -- used 0.009925 seconds

That said, I'm interested in having a look at exactly what "sort" does but I don't find the source code for it in my distribution (1.9.2 MacOS). Any suggestions on where to look?

DanGrayson commented 6 years ago

You should check out our source code repository. Then one may do this:

Macaulay2, version 1.10.0.1
with packages: ConwayPolynomials, Elimination, IntegralClosure, InverseSystems, LLLBases, PrimaryDecomposition, ReesAlgebra, TangentCone

i1 : code methods sort

o1 = -- code for method: sort(List)
     m2/lists.m2:164:20-164:23: --source code:
     sort List :=  opts -> internalsort
     ---------------------------------
     -- code for method: sort(Matrix)
     m2/matrix2.m2:563:18-563:45: --source code:
     sort Matrix := o -> (f) -> f_(sortColumns(f,o))

i2 : debug Core

i3 : internalsort

o3 = internalsort

o3 : CompiledFunction

That shows that it's implemented in our D language, in which the interpreter of the Macaulay2 language is written. In the source tree, those files are in the subdirectory M2/Macaulay2/d. In actors3.d we find the line that enters the function into the Core symbol table:

setupfun("internalsort",sortfun);

Here's the rest:

whichway := GreaterS;
threadLocal sortlist := emptySequence;
subsort(l:int,r:int):Expr := (
     b := r+1-l;
     a := randomint() % b;
     if a < 0 then a = a+b;
     p := l + a;
     pivot := sortlist.p;
     sortlist.p = sortlist.l;
     sortlist.l = pivot;
     i := l+1;
     j := r;
     while i <= j do (
      -- spots 1 .. i-1 contain elements less or equal to the pivot
      -- spots j+1 .. r contain elements greater or equal to the pivot
      -- when i > j we've partitioned all the elements into two parts
      if (
           c := compare(sortlist.i,pivot);
           when c is Error do return c else nothing;
           !(c === whichway)) then i = i+1
      else if (
           c := compare(pivot, sortlist.j);
           when c is Error do return c else nothing;
           !(c === whichway)) then j = j-1
      else (
           tmp := sortlist.i;
           sortlist.i = sortlist.j;
           sortlist.j = tmp;
           i = i+1;
           j = j-1));
     if l+1 < j then subsort(l+1,j);
     if j+1 < r then subsort(j+1,r);
     for k from l+1 to j do sortlist.(k-1) = sortlist.k;
     sortlist.j = pivot;
     nullE);
basicsort(s:Sequence,ww:SymbolClosure):Expr := (
     if length(s) <= 1 then return Expr(s);
     savesortlist := sortlist;
     savewhichway := whichway;
     sortlist = new Sequence len length(s) do foreach x in s do provide x;
     whichway = ww;
     ret := subsort(0,length(s)-1);
     when ret is Error do nothing else ret = Expr(sortlist);
     whichway = savewhichway;
     sortlist = savesortlist;
     ret);
basicsort2(e:Expr,ww:SymbolClosure):Expr := (
     answer :=
     when e is s:Sequence do (
      if length(s) <= 1 then e else basicsort(s,ww))
     is t:List do (
       if ancestor(t.Class, listClass) then (
        if length(t.v) <= 1 then e else (
             r := basicsort(t.v,ww);
             when r is b:Sequence do list(t.Class,b) else r))
           else WrongArg("a list or sequence"))
     else WrongArg("a list or sequence");
     answer);
sortfun(e:Expr):Expr := basicsort2(e,GreaterS);
rsortfun(e:Expr):Expr := basicsort2(e,LessS);
setupfun("internalsort",sortfun);
setupfun("internalrsort",rsortfun);
pzinn commented 6 years ago

Interestingly, I ran into the same problem (lack of idempotency) today. A quick look at the code shows that this is a standard implementation of quicksort, which is known to be "unstable" (i.e., it doesn't deal well with equality cases in the comparison; of course it deals even less well with incomparable elements, but then no sorting algorithm will). I think it'd be good to have a stable variant of sort (say, merge sort) implemented, possibly adding an Option Stable=>true to use it?

DanGrayson commented 6 years ago

Can merge sort be made stable in the presence of incomparable pairs?

pzinn commented 4 years ago

... and reopening an old thread... here's a stable, idempotent sort for an arbitrary preorder. i.e.,

It is slow -- O(n^2) -- but I'm not sure one can do better in this generality. Incidentally, I find it really awkward that instead of using x==y I have to write (x ? y) === symbol ==. any reason why this is so? (even if x<y, say, works perfectly fine)

sortPair = new WrapperType of BasicList;
sortPair ? sortPair := (a,b) -> if (a#0 ? b#0) === symbol == then a#1 ? b#1 else hash(a#0) ? hash(b#0);
stableSort = S -> (
    S = apply(sort apply(#S, i -> sortPair{S#i,i}), s -> s#0);
    d := new MutableList from apply(#S, i -> # select (#S, j -> ( 
        b := S#j ? S#i; b === symbol < or (b === symbol == and j<i))));
    z := new MutableList from positions(d, i -> i===0);
    pos = #z - 1;
    while pos >= 0 list S#(z#pos) do (
    i := z#pos;
    pos = pos - 1;
    scan(#S, j -> (
        b := S#i ? S#j; if b === symbol < or (b === symbol == and i<j) then (
        d#j = d#j - 1;
        if d#j === 0 then ( pos = pos + 1; z#pos = j; );
        )));
    )
)
DanGrayson commented 4 years ago

About Set == Set : since ideal equality I==J is mathematical equality, it would be good if sets of ideals could also be mathematically compared, but instead:

i13 : set {ideal (0,0)} == set{ideal 0}

o13 = incomparable

That's what I've been holding out for. See also https://github.com/Macaulay2/M2/issues/1185

About awkwardness: it's unintentional. This code was probably introduced with Betti tables in mind:

     /Applications/Macaulay2-1.15/share/Macaulay2/Core/set.m2:38:38-43:11: --source code:
     VirtualTally ? VirtualTally := (x,y) -> (
          w := values ((new VirtualTally from x) - (new VirtualTally from y));
          if #w === 0 then symbol ==
          else if all(w,i -> i>0) then symbol >
          else if all(w,i -> i<0) then symbol <
          else incomparable)

I've just noticed we don't have a method for Ideal ? Ideal. We could. ( https://github.com/Macaulay2/M2/issues/1207 )

Does your analysis of your algorithm assume hash code collisions don't occur?

pzinn commented 4 years ago

concerning the awkwardness, I was referring to this comment in help symbol ==:

Caveat
Warning: whether this comparison operator returns true is not necessarily related to whether the comparison operator ? returns symbol ==.

I'm happy with the VirtualTally comparison code. and yes, my algorithm does rely on the absence of hash collisions...

DanGrayson commented 4 years ago

concerning the awkwardness, I was referring to this comment in help symbol ==:

Caveat
Warning: whether this comparison operator returns true is not necessarily related to whether the comparison operator ? returns symbol ==.

That's unavoidable, since X==X and X?X can be implemented in incompatible ways by the user, or by us. And that, in turn, is explained by the desire to define equality even in contexts where we can't define "bigger than".

I'm happy with the VirtualTally comparison code. and yes, my algorithm does rely on the absence of hash collisions...