Closed desb42 closed 4 years ago
Wow! Nice breakdown!
Let me pull down fr.wikipedia.org and try to reproduce. Will comment tomorrow. Thanks!
Yup, definitely looks a sorting issue. I no-op'd the table.sort
by commenting out the sort
lines in luaj_xowa. This gave the expected order.
Regarding sorting:
That said, some implementations of quick sort can be stable, and lua may be doing it in its auxsort
Am going to work on converting auxsort
to Java and seeing if that works
Kudos for the great intuition!
Based on various posts, it looks like Lua's table.sort
is not supposed to be stable. See links below
That said, I changed XOWA to use lua
at home/wiki/Special:XowaCfg?grp=xowa.addon.scribunto and the behavior matches MediaWiki.
So, it looks like auxsort
is the way to go
I too, took a look at auxsort and have produced an implementation of Quicksort and MergeSort
I can see now why auxsort looks like a 'stable sort' - it is, for array sizes of three or less!
Please see attached sorts.zip
Cool, thanks! Lua is definitely doing something odd with its sort
Unfortunately, I couldn't get the sorts.zip to pass the tests. It fails with org.luaj.vm2.LuaError: name:3 vm error: java.lang.ArrayIndexOutOfBoundsException: -1
Test method below
It does pass with a traditional QuickSort method (but it is not stable). Another excerpt lower below
@Test public void sort() {
testSortBasic("'d', 'c', 'b', 'a'", "1=a;2=b;3=c;4=d;");
testSortBasic("'e', 'd', 'c', 'b', 'a'", "1=a;2=b;3=c;4=d;5=e;");
testSortBasic("'a', 'd', 'c', 'b', 'e'", "1=a;2=b;3=c;4=d;5=e;");
testSortBasic("'a', 'b', 'c', 'd', 'e'", "1=a;2=b;3=c;4=d;5=e;");
}
private void testSortBasic(String tblStr, String expd) {
fxt.Clear();
fxt.Init__script
( "local tbl = {" + tblStr + "};"
, "table.sort(tbl);"
, "local rv = '';"
, "table.foreachi"
, "( tbl"
, ", function (k, v)"
, " rv = rv .. k .. '=' .. v .. ';'"
, " end"
, ");"
, "return rv;"
);
fxt.Test(expd);
}
class Quick {
private final LuaTable a;
private final LuaValue cmpfunc;
public Quick(LuaTable a, LuaValue cmpfunc) {
this.a = a;
this.cmpfunc = cmpfunc;
}
// REF.CODE:https://algs4.cs.princeton.edu/23quicksort/Quick.java.html
// quicksort the subarray from a[lo] to a[hi]
public void sort(int lo, int hi) {
if (hi <= lo) return;
int j = partition(lo, hi);
sort(lo, j-1);
sort(j+1, hi);
}
private int partition(int lo, int hi) {
int i = lo;
int j = hi + 1;
LuaValue v = a.array[lo];
while (true) {
// find item on lo to swap
// while (less(a.array[++i], v)) {
while (less(++i, lo)) {
if (i == hi) break;
}
// find item on hi to swap
// while (less(v, a.array[--j])) {
while (less(lo, --j)) {
if (j == lo) break; // redundant since a[lo] acts as sentinel
}
// check if pointers cross
if (i >= j) break;
exch(i, j);
}
// put partitioning item v at a[j]
exch(lo, j);
// now, a[lo .. j-1] <= a[j] <= a[j+1 .. hi]
return j;
}
// is v < w ?
private boolean less(int i, int j) {
if (i == j) return false; // optimization when reference equals
return a.compare(i, j, cmpfunc);
}
// exchange a[i] and a[j]
private void exch(int i, int j) {
LuaValue swap = a.array[i];
a.array[i] = a.array[j];
a.array[j] = swap;
}
}
My bad - I did not do a thorough test of any table size above three
Attached is a revised version. This was partly due to my translation of Lua code to Java and forgetting that the array offset is zero in Java and one in Lua
My test data is
{'a', 'a', 'a', 'a', 'a'}
{'a'}
{'b', 'a'}
{'c', 'b', 'a'}
{'d', 'c', 'b', 'a'}
{'e', 'd', 'c', 'b', 'a'}
{'a', 'b'}
{'a', 'b', 'c'}
{'a', 'b', 'c', 'd'}
{'a', 'b', 'c', 'd', 'e', 'b', 'c', 'd', 'e', 'b', 'c', 'd', 'e', 'b', 'c', 'd', 'e', 'b', 'c', 'd', 'e', 'd', 'e', 'b', 'c', 'd', 'e', 'd', 'e', 'b', 'c', 'd', 'e', 'd', 'e', 'b', 'c', 'd', 'e', 'd', 'e', 'b', 'c', 'd', 'e'}
Nice work! I was struggling with translating the C code and getting the tests to pass.
I was able to shoehorn your code to the original C form, but am still trying to figure out the difference in bounds (see the // desb42@
vs ltablib.c
sections).
Will spend another hour or so on this later tonight / morning, and if no progress, will take your version as is
Again, thanks! That was quite impressive!
public void sort(LuaValue comparator) {
if (m_metatable != null && m_metatable.useWeakValues()) {
dropWeakArrayValues();
}
int n = array.length;
while ( n > 0 && array[n-1] == null )
--n;
// if ( n > 1 )
// heapSort(n, comparator);
if ( n > 1 )
auxsort(this, 0, n - 1, comparator);
// if ( n > 1 ) {
// Quick quick = new Quick(this, comparator);
// quick.sort(0, n - 1);
// }
// if ( n > 1 ) {
// Quicksort ob = new Quicksort(m_metatable, array, n, comparator);
// ob.sort(n);
// }
}
private void set2 (LuaTable tbl, int i, int j) {
// lua_rawseti(L, 1, i);
// lua_rawseti(L, 1, j);
LuaValue iVal = tbl.array[i];
LuaValue jVal = tbl.array[j];
tbl.array[i] = jVal;
tbl.array[j] = iVal;
}
private boolean sort_comp(int a, int b, LuaValue cmpfunc) {
// if (!lua_isnil(L, 2)) { /* function? */
// int res;
// lua_pushvalue(L, 2);
// lua_pushvalue(L, a-1); /* -1 to compensate function */
// lua_pushvalue(L, b-2); /* -2 to compensate function and `a' */
// lua_call(L, 2, 1);
// res = lua_toboolean(L, -1);
// lua_pop(L, 1);
// return res;
// }
// else /* a < b? */
// return lua_lessthan(L, a, b);
return compare(a, b, cmpfunc);
}
private void auxsort(LuaTable tbl, int l, int u, LuaValue cmpfunc) {
while (l < u) { /* for tail recursion */
int i, j;
/* sort elements a[l], a[(l+u)/2] and a[u] */
// lua_rawgeti(L, 1, l);
// lua_rawgeti(L, 1, u);
//if (sort_comp(L, -1, -2)) /* a[u] < a[l]? */
if (sort_comp(u, l, cmpfunc)) /* a[u] < a[l]? */
set2(tbl, l, u); /* swap a[l] - a[u] */
// else
// lua_pop(L, 2);
if (u-l == 1) break; /* only 2 elements */
i = (l+u)/2;
// lua_rawgeti(L, 1, i);
// lua_rawgeti(L, 1, l);
// if (sort_comp(L, -2, -1)) /* a[i]<a[l]? */
if (sort_comp(i, l, cmpfunc)) { /* a[i]<a[l]? */
set2(tbl, i, l);
}
else {
// lua_pop(L, 1); /* remove a[l] */
// lua_rawgeti(L, 1, u);
// if (sort_comp(tbl, -1, -2)) /* a[u]<a[i]? */
if (sort_comp(u, i, cmpfunc)) /* a[u]<a[i]? */
set2(tbl, i, u);
// else
// lua_pop(L, 2);
}
if (u-l == 2) break; /* only 3 elements */
// lua_rawgeti(L, 1, i); /* Pivot */
// lua_pushvalue(L, -1);
// lua_rawgeti(L, 1, u-1);
// ltablib.c
// int P = i;
// set2(tbl, i, u-1); // ltablib.c is u - 1
// /* a[l] <= P == a[u-1] <= a[u], only need to sort from l+1 to u-2 */
// i = l; j = u-1; // i = l; j = u-1
// desb42@
set2(tbl, i, u); // ltablib.c is u - 1
/* a[l] <= P == a[u-1] <= a[u], only need to sort from l+1 to u-2 */
i = l - 1; j = u; // i = l; j = u-1
for (;;) { /* invariant: a[l..i] <= P <= a[j..u] */
/* repeat ++i until a[i] >= P */
// while (lua_rawgeti(L, 1, ++i), sort_comp(-1, -2)) {
/*
compare(0, upper); // condition 'b' as Pivot
while (compare_b(++i) && i <= upper) {
}
*/
while (++i < u && sort_comp(i, u, cmpfunc)) {
if (i>u)
error("invalid order function for sorting");
// lua_pop(L, 1); /* remove a[i] */
}
/* repeat --j until a[j] <= P */
// while (lua_rawgeti(L, 1, --j), sort_comp( -3, -1)) {
// while (--j > l && sort_comp(P, j, cmpfunc)) {
/*
compare(upper, 0); // condition 'a' as Pivot
while (compare_a(--j) && j > lower) {
}
*/
while (--j > l && sort_comp(u, j, cmpfunc)) {
if (j<l)
error("invalid order function for sorting");
// lua_pop(L, 1); /* remove a[j] */
}
if (j<i) {
// lua_pop(L, 3); /* pop pivot, a[i], a[j] */
break;
}
set2(tbl, i, j);
}
// lua_rawgeti(L, 1, u-1);
// lua_rawgeti(L, 1, i);
// ltablib.c
// set2(tbl, u, i-1); /* swap pivot (a[u-1]) with a[i] */
// desb42@
set2(tbl, u, i); /* swap pivot (a[u-1]) with a[i] */
/* a[l..i-1] <= a[i] == P <= a[i+1..u] */
/* adjust so that smaller half is in [j..i] and larger one in [l..u] */
if (i-l < u-i) {
j=l; i=i-1; l=i+2;
}
else {
j=i+1; i=u; u=j-2;
}
auxsort(tbl, j, i, cmpfunc); /* call recursively the smaller one */
} /* repeat the routine for the larger one */
}
looking at the C code, line 266 is, to me, a clue to this off-by-one thingy
This specifies the range as 1 to n, so the array access is arr[1] ... arr[n-1] whereas luaj uses the call 'auxsort(this, 0, n - 1, comparator);' which is low to high so effectively arr[0] ... arr[high]
Another thing that the C code does (and I attempted to replicate is that the pivot value is calculated once and then compared against, rather than recalculating the pivot value for every comparison)
This specifies the range as 1 to n, so the array access is arr[1] ... arr[n-1] whereas luaj uses the call 'auxsort(this, 0, n - 1, comparator);' which is low to high
Good catch! That was it! Added some extra -1
s and that worked. Thanks!
Another thing that the C code does (and I attempted to replicate is that the pivot value is calculated once and then compared against, rather than recalculating the pivot value for every comparison)
Yeah, the C code is very stack-based which allows them to pull the value once. Do you think this will be a big difference? The array[index]
should be fine, so it's just a matter of m_metatable.arrayget(index)
.
Am thinking of including both versions, and adding a low-level option to swap between the two....
Cleaned up the code, and added a few more tests. Decided to default to @desb42's
QuickSort version as it's much more readable and optimized. However, the 3-member test fails: testSortBasic("'c', 'b', 'a'", "'a', 'b', 'c'");
It's a little late here, so I'll look at more in the morning. In addition, will add more tests for the cmpfunc
to make sure arrayIndexes are passed correctly (base-1 vs base-0) as well as have a test for the "stable" sort (something like the original bug reported above)
So, 1-line fix for 3-member test failure. Debugging is much easier when it's not 3 AM in the morning :)
Anyway, added more tests for the comparator functionality. The LuaTableSortDesb42
looks good. :+1:
Just saying it again, even at the risk of being excessive, but thanks again for all the help! The initial investigation was incredibly thorough and detailed, and the actual port of the C code was great workmanship. I had stared at my version for a while, and couldn't figure out the Pivot part until I saw yours. Well done!
Looking at page
fr.wikipedia.org/wiki/Herbert_C._Brown
In the Infobox is a section 'Autres informations'The mediawiki version shows:
the xowa version shows:
within the red ellipse, the order of the items (in this case universities) are in a different order
This data originates from WIkidata Note the order of these universities - it matches the order of the mediawiki version of the page
Diving into this further, this ordering is determined by logic within
Module:Wikidata
, within the functionwd.chronoSort
For these universities there are no time values, so each 'claim.dateSortKey' (line 13) has a value of
0
Starting on line 16, there is a 'table.sort'
I think there might be some subtle luaj issue where the sorting of items where all of them have the same sortkey, the order changes.
btw, the 'Membre de' section also exhibits the same issue