ProteoWizard / pwiz

The ProteoWizard Library is a set of software libraries and tools for rapid development of mass spectrometry and proteomic data analysis software.
http://proteowizard.sourceforge.net/
Apache License 2.0
210 stars 97 forks source link

Skyline: Improve failure reporting in AssertEx.FileEquals #3021

Open brendanx67 opened 1 month ago

brendanx67 commented 1 month ago

This greatly improves failure reporting for AssertEx.FileEquals() where the old code used to just report the two lines where the comparison failed, the new code gives a lot more detail on the internal reasons when paths and column tolerances are involved. This code also adds a class ColumnTolerances to formalize the tolerance evaluation and introduces a tolerance that applies to the mantissa of the number in scientific notation to better capture limitations in precision and potential rounding error.

Before (mistakenly interpreted as a path difference):

Microsoft.VisualStudio.TestTools.UnitTesting.AssertFailedException: Assert.Fail failed. Diff found at line 225 position 0: expected Orbi3_SA_IP_pHis301.mzML 17836 17876 8 3 3495.7076 1166.5775 1.050319E+08 2.961274E+08 110.6513 110.8777 110.7155 0.9989 17848 H254C156N43O46S1[-2.150306] mass3495.7076_RT110.7155 actual z:\download\Perftests\Label-free\converted\Orbi3_SA_IP_pHis301.mzML 17836 17876 8 3 3495.7076 1166.5775 1.050319E+08 2.961275E+08 110.6513 110.8777 110.7155 0.9989 17848 H254C156N43O46S1[-2.150306] mass3495.7076_RT110.7155

After (skipping thousands of lines that match by the specified precision and more clearly showing the problem):

Microsoft.VisualStudio.TestTools.UnitTesting.AssertFailedException: Assert.Fail failed. Diff found at line 7723 position 5: expected Orbi3_SA_IP_pHis301.mzML 2350 2371 5 1 863.4852 864.4928 1732674 5650320 15.8053 15.9211 15.8631 0.9988 2361 H86C38N10O11[+4.837413] mass863.4852_RT15.8631 actual z:\download\Perftests\Label-free\converted\Orbi3_SA_IP_pHis301.mzML 8456 8477 6 2 1228.6413 615.3279 1732675 8681932 51.0093 51.1453 51.1453 0.9942 8477 H102C55N15O16[-0.121619] mass1228.6413RT51.1453 expected with paths removed path 2350 2371 5 1 863.4852 864.4928 1732674 5650320 15.8053 15.9211 15.8631 0.9988 2361 H86C38N10O11[+4.837413] mass863.4852RT15.8631 actual with paths removed path 8456 8477 6 2 1228.6413 615.3279 1732675 8681932 51.0093 51.1453 51.1453 0.9942 8477 H102C55N15O16[-0.121619] mass1228.6413_RT51.1453 matching prefix: 'path ' matching suffix: '' decimal matching: Expected decimal mantissa: 2.35 does not match actual 8.456 to within 0.00015015

bspratt commented 1 month ago

Looks good, other than I really do not think we should be using the scientific notation logic on integers. It took me a little while to understand that it was complaining about "2350" vs "8456". When I think about it, I'd argue for using the E logic only if one or the other text representations does in fact use scientific notation.

And it seems like we could be reporting the column index when comparing TSVs, but that's a nice-to-have.

brendanx67 commented 1 month ago

I think the integer issue can still be a precision thing. I did realize that I could be reporting the original text and the delta multiplied by the exponent which would be clearer in the integer case. Maybe I will do that before merging. Just wanted to get this code in a PR, because I felt it helped immensely to understand what was going on, and in the future avoid thinking about paths when they were not the issue.

On Wed, Jun 19, 2024 at 2:13 PM Brian Pratt @.***> wrote:

Looks good, other than I really do not think we should be using the scientific notation logic on integers. It took me a little while to understand that it was complaining about "2350" vs "8456". When I think about it, I'd argue for using the E logic only if one or the other text representations does in fact use scientific notation.

And it seems like we could be reporting the column index when comparing TSVs, but that's a nice-to-have.

— Reply to this email directly, view it on GitHub https://github.com/ProteoWizard/pwiz/pull/3021#issuecomment-2179462757, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYBWUCHBOJEJKUMU6LRZEDZIHYARAVCNFSM6AAAAABJSVHKGCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZZGQ3DENZVG4 . You are receiving this because you authored the thread.Message ID: @.***>