ELVIS-Project / vis-framework

Thoroughly modern symbolic musical data analysis suite.
http://elvisproject.ca/
31 stars 6 forks source link

Ngram tests test for impossible cases. #359

Closed alexandermorgan closed 8 years ago

alexandermorgan commented 9 years ago

This is closely related to #352 and I believe also #261. Some of the ngram tests, such as test_ngram_13 and 14, feed impossible Series to the Ngram indexer. For example, test_ngram_13's vertical Series must have four elements if the there are three valid horizontal elements, one of which is given the offset 3, meaning that (given that these tests were written with horiz_attach_later = True in mind) it leads to some event after C which has an offset of 2. So there must be a fourth element in vertical, even if it's just C that continues or gets repeated because the ngram indexer forward fills ('ffill') it's vertical values. This will get changed with adjustment that makes the ngram indexer require horiz_attach_later in the horizontal indexer to be set to False. FYI @crantila .

alexandermorgan commented 9 years ago

Perhaps a solution would be to merge the vertical and horizontal series' indecies before forward filling them. This would ensure that there is either one observation of each type at each starting position, or an observation of one type and a NaN for the other type. Both could not be NaN's at the same time.

crantila commented 9 years ago

We'll have to take care of this if we actually rewrite the NGramIndexer, along with the other issues you mentioned. Maybe we should open an issue for that specifically, rather than filing a bunch of these issues all over? Because there are probably ten or more issues that will be solved by that rewrite from scratch.

alexandermorgan commented 9 years ago

I think that if we rewrite it we can just go back over all the open issues and see how many of them it closes.

crantila commented 9 years ago

Sure, hard to know what we'll be able to solve in the end anyway. Guess we should get started on the rewriting effort? I'll write a thing about what I have in mind.

alexandermorgan commented 8 years ago

These cases were impossible in the majority of use cases but could conceivably be used in highly unusual queries. The new_ngram indexer maintains support for this type of flexibility.