Confirm results are reproducible

joewiz commented 3 years ago

On yesterday's Community Call we discussed evidence indicating users obtained different results despite using identical versions of the exist-xqts-runner and command line flags. I propose we use the latest directions derived from https://github.com/eXist-db/exist-xqts-runner/pull/17 and gather results from as many users as possible:

git clone https://github.com/exist-db/exist-xqts-runner.git (ensure a fresh clone of the master branch, no local modifications)
cd exist-xqts-runner
sbt assembly
target/scala-2.13/exist-xqts-runner-assembly-1.0.0.jar -x HEAD
open target/junit/html/index.html and copy and paste the "Summary" table into your reply to this issue. GitHub is smart enough to transform your HTML into GFM, no fiddling needed. For example, here are my results:

Tests	Failures	Errors	Skipped	Success rate	Time
31557	5167	551	1260	81.88%	439.974

adamretter commented 3 years ago

My laptop:

macOS 11.2.3
Java 1.8.0_292
sbt 1.3.3 / 1.5.0

Results:

Iteration #	Tests	Failures	Errors	Skipped	Success rate	Time
1	31557	5157	551	1260	81.91%	800.824
2	31557	5156	551	1260	81.92%	1556.271
3	31557	5157	551	1260	81.91%	791.146

EB Ubuntu Dev Env:

Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-1040-kvm x86_64)
Java 1.8.0_292
sbt 1.3.3 / 1.5.4

Results:

Iteration #	Tests	Failures	Errors	Skipped	Success rate	Time
1	31557	5171	551	1260	81.87%	626.455
2	31557	5170	551	1260	81.87%	810.823
3	31557	5172	551	1260	81.86%	616.676

NOTE: The difference between the test runs on my two machines and that of @joewiz appears to be in the number of Failures, all other numbers are consistent. I think the next things to test are:

If we each run several times on the same machine, do we get the same results?
We need to compare individual failures
1. Where the test failures vary on the same machine, as the XQTS is highly concurrent I suspect a concurrency issue (most likely in eXist-db as opposed to XQTS).
2. Where the test failures vary between different machines, this could be the same concurrency issue, or if the results are dramatically different then I would suspect a machine specific environment issue...

duncdrum commented 3 years ago

Desktop

macOS: 11.4
java: 1.8.0_292
sbt: 1.3.3 / 1.5.4

Results

Iteration	Tests	Failures	Errors	Skipped	Success rate	Time
1	31557	5159	551	1260	81.91%	523.675
2	31557	5160	551	1260	81.90%	661.144
3	31557	5159	551	1260	81.91%	516.675

marmoure commented 3 years ago

Desktop Windows: 10 Pro 19043.906 java: 1.8.0_281 sbt: 1.3.3 / 1.3.3

Iteration	Tests	Failures	Errors	Skipped	Success rate	Time
1	31557	5161	551	1260	81.91%	699.980
2	31557	5159	551	1260	81.91%	688.543
3	31557	5159	551	1260	81.91%	715.330

joewiz commented 3 years ago

My iMac

macOS: 11.4
java: 1.8.0_292 (liberica-jdk8-full)
sbt: 1.5.5

Results

Iteration	Tests	Failures	Errors	Skipped	Success rate	Time
1	31557	5166	551	1260	81.88%	454.876
2	31557	5168	551	1260	81.88%	470.561
3	31557	5166	551	1260	81.88%	495.145

This matches Adam's observation that the Failures alternate between 5166 and 5168, while all other non-Time values remain constant.

The variation in Failures for all of the reported results in this issue is always 0, 1, or 2....

joewiz commented 3 years ago

In https://github.com/eXist-db/exist/pull/3966#issuecomment-890448844 I performed a similar batch of runs of exist-xqts-runner. In the 3 runs for that PR, I saw different results within the 3 PR runs, similar to the +/- 0-3 differences we saw here.

In test 1 of the PR, there were 5,162 failures, but in test 2 a few minutes later, there were 3 fewer failures - 5,159 failures. The 3 differences all occurred within the tests for regular expressions in the fn:matches function:

re00062

Test:

   <test-case name="re00062">
      <description>Test regex syntax</description>
      <created by="Michael Kay" on="2011-07-04"/>
      <test>(every $s in tokenize('', ',') satisfies matches($s, '^(?:[^\p{IsBasicLatin}]*)$')) and (every $s in tokenize('a', ',') satisfies not(matches($s, '^(?:[^\p{IsBasicLatin}]*)$')))</test>
      <result>
         <assert-true/>
      </result>
   </test-case>

Failure in 1st test that passed in 2nd test:

Expected: 'AssertTrue', but query returned an error: QueryError(FORX0002,err:FORX0002 Conversion from XPath F&O 3.0 regular expression syntax to Java regular expression syntax failed: Error at character 22 in regular expression "^(?:[^\p{IsBasicLatin}]*)$": invalid block name (BasicLatin) [at line 1, column 135])

Stacktrace:

junit.framework.AssertionFailedError: Expected: &apos;AssertTrue&apos;, but query returned an error: QueryError(FORX0002,err:FORX0002 Conversion from XPath F&amp;O 3.0 regular expression syntax to Java regular expression syntax failed: Error at character 22 in regular expression &quot;^(?:[^\p{IsBasicLatin}]*)$&quot;: invalid block name (BasicLatin) [at line 1, column 135])
    at org.exist.xqts.runner.JUnitResultsSerializerActor.$anonfun$formatJunitTestSet$6(JUnitResultsSerializerActor.scala:142)
    at org.exist.xqts.runner.JUnitResultsSerializerActor.$anonfun$formatJunitTestSet$6$adapted(JUnitResultsSerializerActor.scala:129)
    at scala.collection.immutable.List.foreach(List.scala:333)
    at org.exist.xqts.runner.JUnitResultsSerializerActor.$anonfun$formatJunitTestSet$1(JUnitResultsSerializerActor.scala:129)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:104)
    at cats.effect.internals.IORunLoop$.restartCancelable(IORunLoop.scala:51)
    at cats.effect.internals.IOBracket$BracketStart.run(IOBracket.scala:100)
    at cats.effect.internals.Trampoline.cats$effect$internals$Trampoline$$immediateLoop(Trampoline.scala:67)
    at cats.effect.internals.Trampoline.startLoop(Trampoline.scala:35)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.super$startLoop(TrampolineEC.scala:90)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.$anonfun$startLoop$1(TrampolineEC.scala:90)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:94)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.startLoop(TrampolineEC.scala:90)
    at cats.effect.internals.Trampoline.execute(Trampoline.scala:43)
    at cats.effect.internals.TrampolineEC.execute(TrampolineEC.scala:42)
    at cats.effect.internals.IOBracket$BracketStart.apply(IOBracket.scala:80)
    at cats.effect.internals.IOBracket$BracketStart.apply(IOBracket.scala:58)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:183)
    at cats.effect.internals.IORunLoop$.restart(IORunLoop.scala:41)
    at cats.effect.internals.IOBracket$.$anonfun$apply$1(IOBracket.scala:48)
    at cats.effect.internals.IOBracket$.$anonfun$apply$1$adapted(IOBracket.scala:34)
    at cats.effect.internals.IOAsync$.$anonfun$apply$1(IOAsync.scala:37)
    at cats.effect.internals.IOAsync$.$anonfun$apply$1$adapted(IOAsync.scala:37)
    at cats.effect.internals.IORunLoop$RestartCallback.start(IORunLoop.scala:447)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:156)
    at cats.effect.internals.IORunLoop$.$anonfun$suspendAsync$1(IORunLoop.scala:321)
    at cats.effect.internals.IORunLoop$.$anonfun$suspendAsync$1$adapted(IORunLoop.scala:320)
    at cats.effect.internals.IORunLoop$RestartCallback.start(IORunLoop.scala:447)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:156)
    at cats.effect.internals.IORunLoop$.start(IORunLoop.scala:38)
    at cats.effect.IO.unsafeRunAsync(IO.scala:274)
    at cats.effect.internals.IOPlatform$.unsafeResync(IOPlatform.scala:39)
    at cats.effect.IO.unsafeRunTimed(IO.scala:342)
    at cats.effect.IO.unsafeRunSync(IO.scala:256)
    at org.exist.xqts.runner.JUnitResultsSerializerActor$$anonfun$receive$1.applyOrElse(JUnitResultsSerializerActor.scala:55)
    at akka.actor.Actor.aroundReceive(Actor.scala:537)
    at akka.actor.Actor.aroundReceive$(Actor.scala:535)
    at org.exist.xqts.runner.JUnitResultsSerializerActor.aroundReceive(JUnitResultsSerializerActor.scala:42)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
    at akka.actor.ActorCell.invoke(ActorCell.scala:548)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
    at akka.dispatch.Mailbox.run(Mailbox.scala:231)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

re00225

Test:

   <test-case name="re00225">
      <description>Test regex syntax</description>
      <created by="Michael Kay" on="2011-07-04"/>
      <test>(every $s in tokenize('&#1536;&#1791;,&#1536;&#1537;&#1538;&#1539;&#1540;&#1541;&#1542;&#1543;&#1544;&#1545;&#1546;&#1547;&#1548;&#1549;&#1550;&#1551;&#1552;&#1553;&#1554;&#1555;&#1556;&#1557;&#1558;&#1559;&#1560;&#1561;&#1562;&#1563;&#1564;&#1565;&#1566;&#1567;&#1568;&#1569;&#1570;&#1571;&#1572;&#1573;&#1574;&#1575;&#1576;&#1577;&#1578;&#1579;&#1580;&#1581;&#1582;&#1583;&#1584;&#1585;&#1586;&#1587;&#1588;&#1589;&#1590;&#1591;&#1592;&#1593;&#1594;&#1595;&#1596;&#1597;&#1598;&#1599;&#1600;&#1601;&#1602;&#1603;&#1604;&#1605;&#1606;&#1607;&#1608;&#1609;&#1610;&#1611;&#1612;&#1613;&#1614;&#1615;&#1616;&#1617;&#1618;&#1619;&#1620;&#1621;&#1622;&#1623;&#1624;&#1625;&#1626;&#1627;&#1628;&#1629;&#1630;&#1631;&#1632;&#1633;&#1634;&#1635;&#1636;&#1637;&#1638;&#1639;&#1640;&#1641;&#1642;&#1643;&#1644;&#1645;&#1646;&#1647;&#1648;&#1649;&#1650;&#1651;&#1652;&#1653;&#1654;&#1655;&#1656;&#1657;&#1658;&#1659;&#1660;&#1661;&#1662;&#1663;&#1664;&#1665;&#1666;&#1667;&#1668;&#1669;&#1670;&#1671;&#1672;&#1673;&#1674;&#1675;&#1676;&#1677;&#1678;&#1679;&#1680;&#1681;&#1682;&#1683;&#1684;&#1685;&#1686;&#1687;&#1688;&#1689;&#1690;&#1691;&#1692;&#1693;&#1694;&#1695;&#1696;&#1697;&#1698;&#1699;&#1700;&#1701;&#1702;&#1703;&#1704;&#1705;&#1706;&#1707;&#1708;&#1709;&#1710;&#1711;&#1712;&#1713;&#1714;&#1715;&#1716;&#1717;&#1718;&#1719;&#1720;&#1721;&#1722;&#1723;&#1724;&#1725;&#1726;&#1727;&#1728;&#1729;&#1730;&#1731;&#1732;&#1733;&#1734;&#1735;&#1736;&#1737;&#1738;&#1739;&#1740;&#1741;&#1742;&#1743;&#1744;&#1745;&#1746;&#1747;&#1748;&#1749;&#1750;&#1751;&#1752;&#1753;&#1754;&#1755;&#1756;&#1757;&#1758;&#1759;&#1760;&#1761;&#1762;&#1763;&#1764;&#1765;&#1766;&#1767;&#1768;&#1769;&#1770;&#1771;&#1772;&#1773;&#1774;&#1775;&#1776;&#1777;&#1778;&#1779;&#1780;&#1781;&#1782;&#1783;&#1784;&#1785;&#1786;&#1787;&#1788;&#1789;&#1790;&#1791;', ',') satisfies matches($s, '^(?:\p{IsArabic}+)$')) and (every $s in tokenize('', ',') satisfies not(matches($s, '^(?:\p{IsArabic}+)$')))</test>
      <result>
         <assert-true/>
      </result>
   </test-case>

Failure in 1st test that passed in 2nd test:

Expected: 'AssertTrue', but query returned an error: QueryError(FORX0002,err:FORX0002 Conversion from XPath F&O 3.0 regular expression syntax to Java regular expression syntax failed: Error at character 16 in regular expression "^(?:\p{IsArabic}+)$": invalid block name (Arabic) [at line 1, column 301])

Stacktrace:

junit.framework.AssertionFailedError: Expected: &apos;AssertTrue&apos;, but query returned an error: QueryError(FORX0002,err:FORX0002 Conversion from XPath F&amp;O 3.0 regular expression syntax to Java regular expression syntax failed: Error at character 16 in regular expression &quot;^(?:\p{IsArabic}+)$&quot;: invalid block name (Arabic) [at line 1, column 301])
    at org.exist.xqts.runner.JUnitResultsSerializerActor.$anonfun$formatJunitTestSet$6(JUnitResultsSerializerActor.scala:142)
    at org.exist.xqts.runner.JUnitResultsSerializerActor.$anonfun$formatJunitTestSet$6$adapted(JUnitResultsSerializerActor.scala:129)
    at scala.collection.immutable.List.foreach(List.scala:333)
    at org.exist.xqts.runner.JUnitResultsSerializerActor.$anonfun$formatJunitTestSet$1(JUnitResultsSerializerActor.scala:129)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:104)
    at cats.effect.internals.IORunLoop$.restartCancelable(IORunLoop.scala:51)
    at cats.effect.internals.IOBracket$BracketStart.run(IOBracket.scala:100)
    at cats.effect.internals.Trampoline.cats$effect$internals$Trampoline$$immediateLoop(Trampoline.scala:67)
    at cats.effect.internals.Trampoline.startLoop(Trampoline.scala:35)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.super$startLoop(TrampolineEC.scala:90)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.$anonfun$startLoop$1(TrampolineEC.scala:90)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:94)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.startLoop(TrampolineEC.scala:90)
    at cats.effect.internals.Trampoline.execute(Trampoline.scala:43)
    at cats.effect.internals.TrampolineEC.execute(TrampolineEC.scala:42)
    at cats.effect.internals.IOBracket$BracketStart.apply(IOBracket.scala:80)
    at cats.effect.internals.IOBracket$BracketStart.apply(IOBracket.scala:58)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:183)
    at cats.effect.internals.IORunLoop$.restart(IORunLoop.scala:41)
    at cats.effect.internals.IOBracket$.$anonfun$apply$1(IOBracket.scala:48)
    at cats.effect.internals.IOBracket$.$anonfun$apply$1$adapted(IOBracket.scala:34)
    at cats.effect.internals.IOAsync$.$anonfun$apply$1(IOAsync.scala:37)
    at cats.effect.internals.IOAsync$.$anonfun$apply$1$adapted(IOAsync.scala:37)
    at cats.effect.internals.IORunLoop$RestartCallback.start(IORunLoop.scala:447)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:156)
    at cats.effect.internals.IORunLoop$.$anonfun$suspendAsync$1(IORunLoop.scala:321)
    at cats.effect.internals.IORunLoop$.$anonfun$suspendAsync$1$adapted(IORunLoop.scala:320)
    at cats.effect.internals.IORunLoop$RestartCallback.start(IORunLoop.scala:447)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:156)
    at cats.effect.internals.IORunLoop$.start(IORunLoop.scala:38)
    at cats.effect.IO.unsafeRunAsync(IO.scala:274)
    at cats.effect.internals.IOPlatform$.unsafeResync(IOPlatform.scala:39)
    at cats.effect.IO.unsafeRunTimed(IO.scala:342)
    at cats.effect.IO.unsafeRunSync(IO.scala:256)
    at org.exist.xqts.runner.JUnitResultsSerializerActor$$anonfun$receive$1.applyOrElse(JUnitResultsSerializerActor.scala:55)
    at akka.actor.Actor.aroundReceive(Actor.scala:537)
    at akka.actor.Actor.aroundReceive$(Actor.scala:535)
    at org.exist.xqts.runner.JUnitResultsSerializerActor.aroundReceive(JUnitResultsSerializerActor.scala:42)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
    at akka.actor.ActorCell.invoke(ActorCell.scala:548)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
    at akka.dispatch.Mailbox.run(Mailbox.scala:231)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

re00061

Test:

   <test-case name="re00061">
      <description>Test regex syntax</description>
      <created by="Michael Kay" on="2011-07-04"/>
      <test>(every $s in tokenize('&#256;', ',') satisfies matches($s, '^(?:[^\p{IsBasicLatin}]+)$')) and (every $s in tokenize('', ',') satisfies not(matches($s, '^(?:[^\p{IsBasicLatin}]+)$')))</test>
      <result>
         <assert-true/>
      </result>
   </test-case>

Failure in 1st test that passed in 2nd test:

Expected: 'AssertTrue', but query returned an error: QueryError(FORX0002,err:FORX0002 Conversion from XPath F&O 3.0 regular expression syntax to Java regular expression syntax failed: Error at character 22 in regular expression "^(?:[^\p{IsBasicLatin}]+)$": invalid block name (BasicLatin) [at line 1, column 43])

Stacktrace:

junit.framework.AssertionFailedError: Expected: &apos;AssertTrue&apos;, but query returned an error: QueryError(FORX0002,err:FORX0002 Conversion from XPath F&amp;O 3.0 regular expression syntax to Java regular expression syntax failed: Error at character 22 in regular expression &quot;^(?:[^\p{IsBasicLatin}]+)$&quot;: invalid block name (BasicLatin) [at line 1, column 43])
    at org.exist.xqts.runner.JUnitResultsSerializerActor.$anonfun$formatJunitTestSet$6(JUnitResultsSerializerActor.scala:142)
    at org.exist.xqts.runner.JUnitResultsSerializerActor.$anonfun$formatJunitTestSet$6$adapted(JUnitResultsSerializerActor.scala:129)
    at scala.collection.immutable.List.foreach(List.scala:333)
    at org.exist.xqts.runner.JUnitResultsSerializerActor.$anonfun$formatJunitTestSet$1(JUnitResultsSerializerActor.scala:129)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:104)
    at cats.effect.internals.IORunLoop$.restartCancelable(IORunLoop.scala:51)
    at cats.effect.internals.IOBracket$BracketStart.run(IOBracket.scala:100)
    at cats.effect.internals.Trampoline.cats$effect$internals$Trampoline$$immediateLoop(Trampoline.scala:67)
    at cats.effect.internals.Trampoline.startLoop(Trampoline.scala:35)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.super$startLoop(TrampolineEC.scala:90)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.$anonfun$startLoop$1(TrampolineEC.scala:90)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:94)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.startLoop(TrampolineEC.scala:90)
    at cats.effect.internals.Trampoline.execute(Trampoline.scala:43)
    at cats.effect.internals.TrampolineEC.execute(TrampolineEC.scala:42)
    at cats.effect.internals.IOBracket$BracketStart.apply(IOBracket.scala:80)
    at cats.effect.internals.IOBracket$BracketStart.apply(IOBracket.scala:58)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:183)
    at cats.effect.internals.IORunLoop$.restart(IORunLoop.scala:41)
    at cats.effect.internals.IOBracket$.$anonfun$apply$1(IOBracket.scala:48)
    at cats.effect.internals.IOBracket$.$anonfun$apply$1$adapted(IOBracket.scala:34)
    at cats.effect.internals.IOAsync$.$anonfun$apply$1(IOAsync.scala:37)
    at cats.effect.internals.IOAsync$.$anonfun$apply$1$adapted(IOAsync.scala:37)
    at cats.effect.internals.IORunLoop$RestartCallback.start(IORunLoop.scala:447)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:156)
    at cats.effect.internals.IORunLoop$.$anonfun$suspendAsync$1(IORunLoop.scala:321)
    at cats.effect.internals.IORunLoop$.$anonfun$suspendAsync$1$adapted(IORunLoop.scala:320)
    at cats.effect.internals.IORunLoop$RestartCallback.start(IORunLoop.scala:447)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:156)
    at cats.effect.internals.IORunLoop$.start(IORunLoop.scala:38)
    at cats.effect.IO.unsafeRunAsync(IO.scala:274)
    at cats.effect.internals.IOPlatform$.unsafeResync(IOPlatform.scala:39)
    at cats.effect.IO.unsafeRunTimed(IO.scala:342)
    at cats.effect.IO.unsafeRunSync(IO.scala:256)
    at org.exist.xqts.runner.JUnitResultsSerializerActor$$anonfun$receive$1.applyOrElse(JUnitResultsSerializerActor.scala:55)
    at akka.actor.Actor.aroundReceive(Actor.scala:537)
    at akka.actor.Actor.aroundReceive$(Actor.scala:535)
    at org.exist.xqts.runner.JUnitResultsSerializerActor.aroundReceive(JUnitResultsSerializerActor.scala:42)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
    at akka.actor.ActorCell.invoke(ActorCell.scala:548)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
    at akka.dispatch.Mailbox.run(Mailbox.scala:231)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

I can't speculate why two runs of exist-xqts-runner run a few minutes apart would produce invalid block name errors in one run but not the next. But these variations do not appear to be related to the PR I was investigating.

Tests 2 vs. 3 differed only by 1 test, and this one was in a different location:

group-015

Test:

   <test-case name="group-015">
      <description>No value comparisons are available to compare the grouping keys.</description>
      <created by="Josh Spiegel" on="2012-10-02"/>
      <modified by="Michael Kay" on="2017-03-17" change="avoid assert-xml for non-XML results"/>
      <test>
          for $x in (true(), "true", xs:QName("true"))
          group by $x
          return $x
      </test>
      <result>
        <assert-permutation>true(), "true", xs:QName("true")</assert-permutation>
      </result>
   </test-case>

The failure in test 3 that passed in test 2:

assert-permutation: expected='true(), "true", xs:QName("true")', actual='Q{}true, "true", true()'" type="junit.framework.AssertionFailedError

Stacktrace:

junit.framework.AssertionFailedError: assert-permutation: expected=&apos;true(), &quot;true&quot;, xs:QName(&quot;true&quot;)&apos;, actual=&apos;Q{}true, &quot;true&quot;, true()&apos;
    at org.exist.xqts.runner.JUnitResultsSerializerActor.$anonfun$formatJunitTestSet$6(JUnitResultsSerializerActor.scala:142)
    at org.exist.xqts.runner.JUnitResultsSerializerActor.$anonfun$formatJunitTestSet$6$adapted(JUnitResultsSerializerActor.scala:129)
    at scala.collection.immutable.List.foreach(List.scala:333)
    at org.exist.xqts.runner.JUnitResultsSerializerActor.$anonfun$formatJunitTestSet$1(JUnitResultsSerializerActor.scala:129)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:104)
    at cats.effect.internals.IORunLoop$.restartCancelable(IORunLoop.scala:51)
    at cats.effect.internals.IOBracket$BracketStart.run(IOBracket.scala:100)
    at cats.effect.internals.Trampoline.cats$effect$internals$Trampoline$$immediateLoop(Trampoline.scala:67)
    at cats.effect.internals.Trampoline.startLoop(Trampoline.scala:35)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.super$startLoop(TrampolineEC.scala:90)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.$anonfun$startLoop$1(TrampolineEC.scala:90)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:94)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.startLoop(TrampolineEC.scala:90)
    at cats.effect.internals.Trampoline.execute(Trampoline.scala:43)
    at cats.effect.internals.TrampolineEC.execute(TrampolineEC.scala:42)
    at cats.effect.internals.IOBracket$BracketStart.apply(IOBracket.scala:80)
    at cats.effect.internals.IOBracket$BracketStart.apply(IOBracket.scala:58)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:183)
    at cats.effect.internals.IORunLoop$.restart(IORunLoop.scala:41)
    at cats.effect.internals.IOBracket$.$anonfun$apply$1(IOBracket.scala:48)
    at cats.effect.internals.IOBracket$.$anonfun$apply$1$adapted(IOBracket.scala:34)
    at cats.effect.internals.IOAsync$.$anonfun$apply$1(IOAsync.scala:37)
    at cats.effect.internals.IOAsync$.$anonfun$apply$1$adapted(IOAsync.scala:37)
    at cats.effect.internals.IORunLoop$RestartCallback.start(IORunLoop.scala:447)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:156)
    at cats.effect.internals.IORunLoop$.$anonfun$suspendAsync$1(IORunLoop.scala:321)
    at cats.effect.internals.IORunLoop$.$anonfun$suspendAsync$1$adapted(IORunLoop.scala:320)
    at cats.effect.internals.IORunLoop$RestartCallback.start(IORunLoop.scala:447)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:156)
    at cats.effect.internals.IORunLoop$.start(IORunLoop.scala:38)
    at cats.effect.IO.unsafeRunAsync(IO.scala:274)
    at cats.effect.internals.IOPlatform$.unsafeResync(IOPlatform.scala:39)
    at cats.effect.IO.unsafeRunTimed(IO.scala:342)
    at cats.effect.IO.unsafeRunSync(IO.scala:256)
    at org.exist.xqts.runner.JUnitResultsSerializerActor$$anonfun$receive$1.applyOrElse(JUnitResultsSerializerActor.scala:55)
    at akka.actor.Actor.aroundReceive(Actor.scala:537)
    at akka.actor.Actor.aroundReceive$(Actor.scala:535)
    at org.exist.xqts.runner.JUnitResultsSerializerActor.aroundReceive(JUnitResultsSerializerActor.scala:42)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
    at akka.actor.ActorCell.invoke(ActorCell.scala:548)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
    at akka.dispatch.Mailbox.run(Mailbox.scala:231)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

Comparing test 1 to test 3, all 4 of the exact same differences above were the causes of the differences.

This would explain the consistent range of variation of 0-3 in the results that we all reported:

0 for the case when both tests happened to return identical changes
1 for the case where 1 test failed the 1 GroupByClause test
2 for the case where 1 test failed the 3 matches.re.xml tests and the other failed the 1 GroupByClause test
3 for the case where 1 test failed the 3 matches.re.xml tests

Note that we didn't see a variation of 4—for the case where 1 test failed both the 1 GroupByClause test and the 3 matches.re.xml tests. Perhaps we'd see this if we performed more test runs, or perhaps the two groups of failing tests don't occur in the same run, i.e., they're inter-related?

This is just a running theory. Perhaps there are other tests that fail besides these, and only additional runs and comparisons would reveal them.

To check which testsuites were responsible for the difference between 2 test runs, save the target/junit/data/TESTS-TestSuites.xml files from each run, and then run this query - providing the $tss-1 and $tss-2 paths to the two files:

xquery version "3.1";

let $tss1 := doc("/db/apps/exist-xqts-results/data/5.4.0-SNAPSHOT-with-Juri-PR/test01/junit/data/TESTS-TestSuites.xml")/testsuites/testsuite
let $tss2 := doc("/db/apps/exist-xqts-results/data/5.4.0-SNAPSHOT-with-Juri-PR/test03/junit/data/TESTS-TestSuites.xml")/testsuites/testsuite
return
    array {
        for $ts1 in $tss1
        let $ts1-failures := $ts1/@failures
        let $ts2 := $tss2[@package eq $ts1/@package and @name eq $ts1/@name]
        let $ts2-failures := $ts2/@failures
        return
            if ($ts1-failures ne $ts2-failures) then
                map {
                    "package": $ts1/@package/string(),
                    "name": $ts1/@name/string(),
                    "ts1-failures": $ts1/@failures cast as xs:integer,
                    "ts2-failures": $ts2/@failures cast as xs:integer
                }
            else
                ()
    }

This returns a result like:

[
    {
        "package": "XQTS_HEAD.fn-matches",
        "name": "re",
        "ts1-failures": 7,
        "ts2-failures": 4
    },
    {
        "package": "XQTS_HEAD",
        "name": "prod-GroupByClause",
        "ts1-failures": 15,
        "ts2-failures": 16
    }
]

To derive the table like the one I posted in the PR comment linked above, which listed the tests that returned different results in 2 test runs, I uploaded the entire junit directories to eXist and ran the following query:

xquery version "3.1";

declare namespace output="http://www.w3.org/2010/xslt-xquery-serialization";

declare option output:method "html5";
declare option output:media-type "text/html";

declare function local:compare-testcase($testcase-1, $testcase-2) {
    element tr {
        element td { $testcase-1/../@package/string() },
        element td { $testcase-1/../@name/string() },
        element td { $testcase-1/@name/string() },
        element td { ($testcase-1/*/name(), "pass")[. ne ""][1] },
        element td { ($testcase-2/*/name(), "pass")[. ne ""][1] }
    }
};

declare function local:compare-testcases($testcases-1, $testcases-2) {
    for $tc1 in $testcases-1
    let $name := $tc1/@name
    let $tc2 := $testcases-2[@name eq $name]
    order by $name
    return
        if (
                (empty($tc1/node()) and empty($tc2/node()))
                or 
                ($tc1/error and $tc2/error)
                or 
                ($tc1/failure and $tc2/failure)
                or 
                ($tc1/skipped and $tc2/skipped)
            ) then
            ()
        else
            local:compare-testcase($tc1, $tc2)
};

declare function local:compare-testsuites($testsuites-1, $testsuites-2) {
    element table {
        element thead {
            element tr {
                element th { "testsuite package" },
                element th { "testsuite name" },
                element th { "testcase name" },
                element th { "test 1" },
                element th { "test 2" }
            }
        },
        element tbody {
            for $ts1 in $testsuites-1
            let $package := $ts1/@package
            let $name := $ts1/@name
            let $ts2 := $testsuites-2[@package eq $package and @name eq $name]
            order by $package, $name
            return 
                if ($ts1/@errors eq "0" and $ts2/@errors eq "0") then
                    ()
                else
                    local:compare-testcases($ts1/testcase, $ts2/testcase)
        }
    }
};

let $data-collection := "/db/apps/exist-xqts-results/data"
let $testsuites-1 := 
    doc($data-collection || "/5.4.0-SNAPSHOT-before-Juri-PR/test01/junit/data/TESTS-TestSuites.xml")/testsuites/testsuite
let $testsuites-2 := 
    doc($data-collection || "/5.4.0-SNAPSHOT-with-Juri-PR/test01/junit/data/TESTS-TestSuites.xml")/testsuites/testsuite
return
    local:compare-testsuites($testsuites-1, $testsuites-2)

... returns a table like this:

testsuite package	testsuite name	testcase name	test 1	test 2
XQTS_HEAD	prod-UnaryLookup	UnaryLookup-011	failure	error
XQTS_HEAD	prod-UnaryLookup	UnaryLookup-015	failure	pass
XQTS_HEAD	prod-UnaryLookup	UnaryLookup-016	failure	error
XQTS_HEAD	prod-UnaryLookup	UnaryLookup-017	failure	error
XQTS_HEAD	prod-UnaryLookup	UnaryLookup-023	failure	pass
XQTS_HEAD	prod-UnaryLookup	UnaryLookup-025	failure	pass
XQTS_HEAD	prod-UnaryLookup	UnaryLookup-046	failure	pass

I hope these results and queries help us nail down the sources of unexpected variation in the results of exist-xqts-runner.

dizzzz commented 3 years ago

impressive research and analysis !

dizzzz commented 3 years ago

Order variations in results sound like... usage of a Hashmap somewhere.

eXist-db / exist-xqts-runner