Handling inaccurate test reports

devsnek commented 4 years ago

There have been some ongoing issues with test262-harness and/or the configuration used to run test262-harness, which result in failures that aren't actually issues in the engines being tested. I'd like to propose that two things happen: First, failed test output be made available somewhere, hopefully right on the site with the relevant test. Secondly, that the scripts used to run the harness are available in an open way so we can fix issues that come up (just whatever scripts run the harness, we don't need all of the CI config). The main goal here is implementers being able to understand and get actionable information from tests that test262.report says are failing, and I'm open to other solutions in this area as well.

devsnek commented 4 years ago

example of test that is listed as a failure on test262.report but passes when i test it locally: https://test262.report/browse/built-ins/RegExp/lookBehind/alternations.js

rwaldron commented 4 years ago

Looks like it doesn't pass for me:

$ test262-harness --t=8 --hostType=engine262 --hostPath=`which engine262` --hostArgs='--features=all' --reporter-keys=relative,scenario,result,rawResult,attrs.features test/built-ins/RegExp/lookBehind/alternations.js
FAIL test/built-ins/RegExp/lookBehind/alternations.js (default)
  Expected no error, got TypeError: Assert failed: match.StartIndex >= 0 && match.StartIndex < S.stringValue().length

FAIL test/built-ins/RegExp/lookBehind/alternations.js (strict mode)
  Expected no error, got TypeError: Assert failed: match.StartIndex >= 0 && match.StartIndex < S.stringValue().length

Ran 2 tests
0 passed
2 failed

I ran esvu before running this, and it installed:

$ esvu
esvu ❯ version 1.2.2
Chakra ❯ Checking version...
Chakra ✔ Up to date
engine262 ❯ Checking version...
engine262 ! There are external requirements that may need to be installed:
engine262 !   * Node.js - https://nodejs.org/
engine262 ❯ Installing version 0.0.1-597da6c1cb7fb0390a3ae8e216c71c884f9c1c76
engine262 ❯ Downloading https://api.engine262.js.org/download?version=0.0.1-597da6c1cb7fb0390a3ae8e216c71c884f9c1c76
engine262 ❯ Extracting /var/folders/xz/k2lxgs3s43180_tjwn1v_kwc0000gn/T/e435ddf31f50c4caf5937c74f19ead94
engine262 ❯ Installing /var/folders/xz/k2lxgs3s43180_tjwn1v_kwc0000gn/T/e435ddf31f50c4caf5937c74f19ead94-extracted
engine262 ❯ Testing...
engine262 ✔ Installed with bin entries: engine262
...

That test262-harness config is copied from the CI:

(I just removed --reporter=json for readibility)

devsnek commented 4 years ago

It appears that issue is triggered by adding the --features=all flag. I can think of a few ways to improve this situation:

test262.report can provide the command and output for failed tests
test262.report can open source the configs so we can reproduce the command and output for ourselves
I can add a json output mode to engine262's test runner and test262.report can use that (which also has the benefit of being much faster than test262-harness because it doesn't spin up a new process for each test)

Other suggestions are welcome too.

rwaldron commented 4 years ago

I think I added --hostArgs='--features=all' at your recommendation 0_o

devsnek commented 4 years ago

Yeah enabling all engine262 features for all tests isn't inherently wrong (it should work fine, engine262 clearly has a bug here), it just isn't how I run tests. The mismatch in configuration is why I want test262.report to be more open about how it tests engines.

rwaldron commented 4 years ago

test262.report can open source the configs so we can reproduce the command and output for ourselves

All the configs are the same. I gave engine262 special treatment by adding the --hostArgs. I removed it:

jobs:
  V8:
    executor: node
    environment:
      ENGINEPATH: binpath
      T262ENGINE: v8
      T262NAME: v8
    <<: [*execution_steps]
  SpiderMonkey:
    executor: node
    environment:
      ENGINEPATH: binpath
      T262ENGINE: spidermonkey
      T262NAME: spidermonkey
    <<: [*execution_steps]
  ChakraCore:
    executor: node
    environment:
      ENGINEPATH: binpath
      T262ENGINE: chakra
      T262NAME: chakra
    <<: [*execution_steps]
  JSC:
    executor: node
    environment:
      LANG: C
      ENGINEPATH: binpath
      T262ENGINE: javascriptcore
      T262NAME: javascriptcore
    <<: [*execution_steps]
  XS:
    executor: node
    environment:
      ENGINEPATH: binpath
      T262ENGINE: xs
    <<: [*execution_steps]
  Hermes:
    executor: node
    environment:
      ENGINEPATH: binpath
      T262ENGINE: hermes
    <<: [*execution_steps]
  QuickJS:
    executor: node
    environment:
      ENGINEPATH: binpath
      T262ENGINE: qjs
    <<: [*execution_steps]
  engine262:
    executor: node
    environment:
      ENGINEPATH: binpath
      T262ENGINE: engine262
      T262NAME: engine262
    <<: [*execution_steps]

execution_steps looks like this:

execution_steps: &execution_steps
  steps:
    - checkout
    - npm_install
    - fetch_via_esvu
    - fetch_test262
    - capture_and_parse

devsnek commented 4 years ago

Sorry I meant, adding --features=all is good, it should give you more accurate tests. The point was just that, given the FAIL message on test262.report, I didn't have the information to figure out why it was failing. Thanks for posting the commands and such, it is helpful.

bocoup / test262-report-issue-tracker

Handling inaccurate test reports #19