ELVIS-Project / vis-framework

Thoroughly modern symbolic musical data analysis suite.
http://elvisproject.ca/
31 stars 6 forks source link

music21 objects are repeatedly created and not stored. #373

Closed alexandermorgan closed 8 years ago

alexandermorgan commented 8 years ago

In the init method of Indexed_Piece there should really be a place to store a dataframe of the music21 objects in a piece. Because these objects are never saved, we repeatedly do this calculation for every stream_indexer call. This will be the first step towards vast performance improvements made by not repeating calculations and doing dataframe-wise indexing instead of element-wise indexing. This can start with the stream indexers, since they take the most time and especially since they don't have any settings.

musicus commented 8 years ago

:+1:

alexandermorgan commented 8 years ago

Update: music21 objects are somewhat unwieldy to store in certain cases if you then try to multiprocess those objects. This is because data has to be pickled to be multiprocessed in python, and pickling music21 objects is reliable but rather slow. For this reason, for the interval indexers the current procedure of processing strings and repeatedly turning them into music21 notes and then music21 intervals may actually be an optimal implementation.

crantila commented 8 years ago

^ That's why it was done that way to begin with.

To note: in 2013 I wrote a caching IntervalIndexer for pre-Framework VIS that was faster by some preposterous margin. To implement this now you would memoize the indexer_func() in the interval module. That is: in run() you store a mapping between arguments to the indexer_func() and what it returns, so that the indexer_func() only needs to be called once with any unique arguments. Music is highly repetitive, so it works pretty well in this case, though there is a substantial increase in memory consumption.

alexandermorgan commented 8 years ago

I hadn't thought of using memoization. That's a good idea, especially now that ram isn't much of an issue.

alexandermorgan commented 8 years ago

@crantila do you have a copy of your memoization implementation anywhere? This could be a good strategy to implement in VIS 3.0.

crantila commented 8 years ago

I don't know where it's gone, but it doesn't really apply in the Framework, and it was super rudimentary. Something like this:

_MEMO_DICT = {}

def indexer_func(one, two, three):
    key = '{0}{1}{2}'.format(one, two, three)
    if key not in _MEMO_DICT:
        _MEMO_DICT[key] = interval.Interval(one, two, three)  # whatever, make an m21 interval

    return _MEMO_DICT[key]

It actually stored the Interval objects in the dictionary, but you'll want to store the string that gets returned. Possibly also do something to prevent the dictionary from growing too large, but that's probably not necessary.

alexandermorgan commented 8 years ago

@crantila 's idea to use memoization was really effective and provided an easy solution to this problem. It's already implemented on alex_devel as of this commit: 68be79908f08ba1700d44a0da04442436f56a313