Tract Onnx Image Classification Benchmark's Output Doesn't Correspond with Wasmtime Output

d-sonuga commented 1 month ago

While running the benchmarks, the following error was produced:

Error: Actual output does not match the expected output!
* Actual output is located at `stdout-de1e78351a69609e-4163462-0.log`
* Expected output is located at `benchmarks/tract-onnx-image-classification/benchmark.stdout.expected`
Error: benchmark subprocess did not exit successfully

The contents of benchmarks/tract-onnx-image-classification/benchmark.stdout.expected:

[Classification { label: "tiger", score: 17.559244 }, Classification { label: "tiger cat", score: 14.740076 }, Classification { label: "zebra", score: 12.357242 }]

And the contents of stdout-de1e78351a69609e-4163462-0.log:

[Classification { label: "tiger", score: 17.559246 }, Classification { label: "tiger cat", score: 14.740076 }, Classification { label: "zebra", score: 12.357242 }]

jlb6740 commented 1 month ago

@d-sonuga .. What platform did you run this on? These floating point results have a chance to be slightly different across architectures. We can either fix this by making the score we match on to be less precise and/or we can add logic that instead of doing the match directly in sightglass, sightglass asks a script to do the match where the script can use more advance logic (check a range for example) to return if results are OK or not.

d-sonuga commented 1 month ago

@jlb6740, I ran it on aarch64.

jlb6740 commented 1 month ago

@d-sonuga Cool, thanks. Hopefully #277 will address the issue.

jlb6740 commented 1 month ago

Hi @d-sonuga, can you confirm this is fixed for you and close when you get a chance?

d-sonuga commented 3 weeks ago

@jlb6740, it's fixed. Thanks!

bytecodealliance / sightglass

Tract Onnx Image Classification Benchmark's Output Doesn't Correspond with Wasmtime Output #276