Closed alexandermorgan closed 8 years ago
:+1:
Update: music21 objects are somewhat unwieldy to store in certain cases if you then try to multiprocess those objects. This is because data has to be pickled to be multiprocessed in python, and pickling music21 objects is reliable but rather slow. For this reason, for the interval indexers the current procedure of processing strings and repeatedly turning them into music21 notes and then music21 intervals may actually be an optimal implementation.
^ That's why it was done that way to begin with.
To note: in 2013 I wrote a caching IntervalIndexer
for pre-Framework VIS that was faster by some preposterous margin. To implement this now you would memoize the indexer_func()
in the interval
module. That is: in run()
you store a mapping between arguments to the indexer_func()
and what it returns, so that the indexer_func()
only needs to be called once with any unique arguments. Music is highly repetitive, so it works pretty well in this case, though there is a substantial increase in memory consumption.
I hadn't thought of using memoization. That's a good idea, especially now that ram isn't much of an issue.
@crantila do you have a copy of your memoization implementation anywhere? This could be a good strategy to implement in VIS 3.0.
I don't know where it's gone, but it doesn't really apply in the Framework, and it was super rudimentary. Something like this:
_MEMO_DICT = {}
def indexer_func(one, two, three):
key = '{0}{1}{2}'.format(one, two, three)
if key not in _MEMO_DICT:
_MEMO_DICT[key] = interval.Interval(one, two, three) # whatever, make an m21 interval
return _MEMO_DICT[key]
It actually stored the Interval
objects in the dictionary, but you'll want to store the string that gets returned. Possibly also do something to prevent the dictionary from growing too large, but that's probably not necessary.
@crantila 's idea to use memoization was really effective and provided an easy solution to this problem. It's already implemented on alex_devel as of this commit: 68be79908f08ba1700d44a0da04442436f56a313
In the init method of Indexed_Piece there should really be a place to store a dataframe of the music21 objects in a piece. Because these objects are never saved, we repeatedly do this calculation for every stream_indexer call. This will be the first step towards vast performance improvements made by not repeating calculations and doing dataframe-wise indexing instead of element-wise indexing. This can start with the stream indexers, since they take the most time and especially since they don't have any settings.