borglab / SwiftFusion

Apache License 2.0
115 stars 13 forks source link

TF test fails on Xavier platform #207

Open dellaert opened 3 years ago

dellaert commented 3 years ago

There was one test in feature/box_scaling that failed for me on Xavier NX:

Test Case 'TensorFlowMatrixTests.testConcat' passed (4.669 seconds)
Test Case 'TensorFlowMatrixTests.test_log' started at 2020-11-07 19:45:31.242
/home/dellaert/git/SwiftFusion/Tests/SwiftFusionTests/Core/TensorFlowMatrixTests.swift:38: error: TensorFlowMatrixTests.test_log : XCTAssertTrue failed - value mismatch:
[              -inf,                0.0, 0.6931471805599453, 1.0986122886681098,
 1.3862943611198906, 1.6094379124341003]
is not equal to
[              -inf,                0.0, 0.6931471805599453, 1.0986122886681098,
 1.3862943611198906, 1.6094379124341003]
with accuracy 1e-08
Test Case 'TensorFlowMatrixTests.test_log' failed (0.054 seconds)

Might be related to the infinity?

dellaert commented 3 years ago

PS passes fine on Mac

ProfFan commented 3 years ago
Test Case 'TensorFlowMatrixTests.test_log' started at 2020-11-08 14:43:21.650
/workspaces/SwiftFusion/Tests/SwiftFusionTests/Core/TensorFlowMatrixTests.swift:38: error: TensorFlowMatrixTests.test_log : XCTAssertTrue failed - value mismatch:
[              -inf,                0.0, 0.6931471805599453, 1.0986122886681098,
 1.3862943611198906, 1.6094379124341003]
is not equal to
[              -inf,                0.0, 0.6931471805599453, 1.0986122886681098,
 1.3862943611198906, 1.6094379124341003]

For me on Linux as well, I think it is very likely to be related to CUDA.

CC @marcrasi @BradLarson is this precision-related?

BradLarson commented 3 years ago

That test is trying to take log(0), which seems like a bad thing to be testing for. I wouldn't be surprised to have that break in different ways on different platforms. Maybe having the range start at 1 would be a safer lower value?

ProfFan commented 3 years ago

I think it is still perfectly valid to assume -inf==-inf since that is guaranteed by the IEEE754 standard I think. I could be wrong so please correct me if not :)

dellaert commented 3 years ago

I agree with @BradLarson