Closed crantila closed 9 years ago
@alexandermorgan do you remember whether these tests were counted manually? Or at least verified by hand/eye?
Looks like I caused this for myself by "fixing" issue #298. What a dunce.
Though I've fixed the original problem, I'm still having problems getting all the integration tests to pass, so this issue lives on.
Having heard nothing, and being quite pressed for time, I've decided to scrap the existing "interval n-grams" integration tests 9a, 9b, and 9c.
In commit ffec69047e718550a9b105003c6e404099ba4e37, I've removed the old tests. Test 9a didn't even seem to be a real regression test for issue #305, in spite of what it claimed, because that bug required setting an 'offset interval'
but the test didn't do so.
As stated in the preceding commits, I'm also working on a replacement set of integration tests, based on the same piece and voice pair, but counted by hand.
This thread is a little tough to follow because I don't know how you found out that the tests were wrong. The tests you were asking me about were difficult to check by hand because count frequency was on, so there's no way to go through the piece and check n-gram by n-gram. So I'm pretty sure the expected values were just copied and pasted results which is probably not a good idea.
So I guess it's a decent use of time, then, to count by hand. I found the problem initially when I was revising the tests for VIS 2 output. The rabbit hole got deeper when I realized the comments said the tests were for "bwv77" but they were actually loading the "Kyrie" file.
FWIW, I couldn't really tell---even after looking through the git history---exactly how or why things ended up like that.
The n-gram integration tests 7, 8, and 9 are all wrong, and that's a problem, because I know for sure that it means we're getting incorrect results.