Multiple failing tests in 2.18.1 on arm64 and armhf

rbalint commented 4 years ago

In Ubuntu CI there are multiple failures observed on ARM architectures:

https://autopkgtest.ubuntu.com/packages/r/r-cran-openmx/groovy/armhf https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-groovy/groovy/armhf/r/r-cran-openmx/20200906_151809_74358@/log.gz

...
── 9. Error: probit+poisson ML+WLS (@test-discrete.R#290)  ─────────────────────
Don't know how to interpret factor column 'z4' as numeric.
You may want to specify thresholds for your model like this: mxThreshold(vars='z4', nThresh=3)
Backtrace:
 1. OpenMx::mxRun(build())
 2. OpenMx:::runHelper(...)

terminate called after throwing an instance of 'std::runtime_error'
  what():  Problem in dVnames mapping
Aborted (core dumped)
...

https://autopkgtest.ubuntu.com/packages/r/r-cran-openmx/groovy/arm64 https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-groovy/groovy/arm64/r/r-cran-openmx/20200906_151750_74358@/log.gz

...
══ testthat results  ═══════════════════════════════════════════════════════════
[ OK: 598 | SKIPPED: 4 | WARNINGS: 3 | FAILED: 4 ]
1. Error: RAM (@test-discrete.R#138) 
2. Error: mediation (@test-discrete.R#201) 
3. Error: LISREL (@test-discrete.R#247) 
4. Error: probit+poisson ML+WLS (@test-discrete.R#290) 

Error: testthat unit tests failed
...

The arm64 regression can be observed on Debian, too: https://ci.debian.net/packages/r/r-cran-openmx/testing/arm64/

RMKirkpatrick commented 4 years ago

Thanks for your report. BTW, the armhf regression is observed under Debian as well: https://ci.debian.net/packages/r/r-cran-openmx/testing/armhf/ .

A few remarks for the other developers:

The fatal backend runtime error observed on armhf has this message, "Problem in dVnames mapping", meaning that the offending test involves an MxFitFunctionGREML. So, probably 'test-GREML_Error_Detection'.
The failures common to both architectures occur with 'test-discrete'. Might they be due to an issue that @jpritikin has patched since the 2.18.1 release, perhaps by aa1a80b ?
Something weird, and probably architecture-specific, seems to be happening on armhf.

jpritikin commented 4 years ago

Any change we can get an ssh session on an armhf box for debugging?

rbalint commented 4 years ago

Unfortunately the machines I use can't be shared. :-( Some cloud providers give free credit/initially free tier accounts and they offer arm64 machines which could be used, or a cheap ARM SoC could be a testbed.

tbates commented 4 years ago

grab an https://en.wikipedia.org/wiki/Developer_Transition_Kit_(2020)

More easily, AWS have arm, if you’re already setup for their cloud? Their arm boxes are called “graviton 2”

RMKirkpatrick commented 4 years ago

@rbalint , you are able to build OpenMx from the head of the master branch on at least one of the two architectures, correct? Would you mind doing so, and running the test suite? It would be good to know if any of the test failures have been resolved since the v.2.18.1 release.

jpritikin commented 4 years ago

Has anybody tried https://wiki.debian.org/Arm64Qemu ? Would that get the job done?

RMKirkpatrick commented 4 years ago

I was actually about to suggest virtualization. I daresay it would get the job done if we can't test on real hardware. I've never run a Linux/GNU guest in Qemu, but I have used it to run a TempleOS guest.

But, we might be better off trying to emulate armhf, since it's the architecture on which more tests are failing.

tbates commented 4 years ago

FYI, AWS has 750 hrs/month free until December to encourage use of their ARM (they call it graviton) processors

https://aws.amazon.com/ec2/graviton/

On 26 Sep 2020, at 9:11 pm, Joshua Pritikin notifications@github.com wrote:

Has anybody tried https://wiki.debian.org/Arm64Qemu https://wiki.debian.org/Arm64Qemu ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OpenMx/OpenMx/issues/296#issuecomment-699542503, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTXHGLSC63DXUXEDLCQDTSHZDI5ANCNFSM4R3A5LCQ.

jpritikin commented 4 years ago

FYI, the instructions at https://wiki.debian.org/Arm64Qemu do seem to work.

jpritikin commented 4 years ago

I managed to reproduce this failure using the qemu-system-aarch64 simulator running on amd64. Any other failures?

ginggs commented 4 years ago

arm64 passes for me now. There are still the armhf (32-bit ARM) failures.

RMKirkpatrick commented 4 years ago

I was looking at Qemu's documentation concerning ARM emulation over the weekend. Would testing on a virtual armhf system merely entail configuring Qemu to emulate some arbitrary 32-bit ARM board that includes the FPU coprocessor?

ginggs commented 4 years ago

This page recommends vexpress-a9 or vexpress-a15 for armhf https://wiki.ubuntu.com/Kernel/Dev/QemuARMVexpress

rbalint commented 4 years ago

For the record the Ubuntu armhf runners run armhf LXD containers in arm64 VMs thus the aarch64 qemu VM is enough to triage the armhf failure as well.

OpenMx / OpenMx

Multiple failing tests in 2.18.1 on arm64 and armhf #296