bytecodealliance / sightglass

A benchmark suite and tool to compare different implementations of the same primitives.
Apache License 2.0
69 stars 33 forks source link

Tract Onnx Image Classification Benchmark's Output Doesn't Correspond with Wasmtime Output #276

Closed d-sonuga closed 3 weeks ago

d-sonuga commented 1 month ago

While running the benchmarks, the following error was produced:

Error: Actual output does not match the expected output!
* Actual output is located at `stdout-de1e78351a69609e-4163462-0.log`
* Expected output is located at `benchmarks/tract-onnx-image-classification/benchmark.stdout.expected`
Error: benchmark subprocess did not exit successfully

The contents of benchmarks/tract-onnx-image-classification/benchmark.stdout.expected:

[Classification { label: "tiger", score: 17.559244 }, Classification { label: "tiger cat", score: 14.740076 }, Classification { label: "zebra", score: 12.357242 }]

And the contents of stdout-de1e78351a69609e-4163462-0.log:

[Classification { label: "tiger", score: 17.559246 }, Classification { label: "tiger cat", score: 14.740076 }, Classification { label: "zebra", score: 12.357242 }]
jlb6740 commented 1 month ago

@d-sonuga .. What platform did you run this on? These floating point results have a chance to be slightly different across architectures. We can either fix this by making the score we match on to be less precise and/or we can add logic that instead of doing the match directly in sightglass, sightglass asks a script to do the match where the script can use more advance logic (check a range for example) to return if results are OK or not.

d-sonuga commented 1 month ago

@jlb6740, I ran it on aarch64.

jlb6740 commented 1 month ago

@d-sonuga Cool, thanks. Hopefully #277 will address the issue.

jlb6740 commented 1 month ago

Hi @d-sonuga, can you confirm this is fixed for you and close when you get a chance?

d-sonuga commented 3 weeks ago

@jlb6740, it's fixed. Thanks!