ELVIS-Project / vis-framework

Thoroughly modern symbolic musical data analysis suite.
http://elvisproject.ca/
31 stars 5 forks source link

Refactor opus importation #418

Closed alexandermorgan closed 8 years ago

alexandermorgan commented 8 years ago

I'm having a little bit of a hard time understanding how to handle opus importation. I made _known_opus an attribute of indexed_piece objects so that we could reference it easily without having to pass the value around in all of the methods of indexed_piece.py. In _import_score, the description of the known_opus is:

:param known_opus: Whether you expect the file to import as a :class:Opus.

Why would we have an expectation one way or another? It seems like if this is true: isinstance(score, stream.Opus) then it's an opus, case closed.

I've read the "Note about Opus Objects" in get_data and the source code for :meth:vis.workflow.WorkflowManager.load I get that when you try to import a piece that is actually an opus, it ends up being represented as a list of indexed piece objects. That's actually pretty nice functionality, but should this then become an aggregated_pieces object? @crantila what am I supposed to do with the list of index_pieces?

crantila commented 8 years ago

This isn't going to be fun.

I did the wrong thing here, in terms of writing idiomatic Python. Here's the idiomatic version:

asdf = IndexedPiece('some_music.krn')
data = asdf.get_data([SuperbIndexer])

if isinstance(data, pandas.DataFrame):
    # do it one way, for VIS 2 analyzers
elif isinstance(data[0], pandas.Series):
    # do it another way, for VIS 1 analyzers
elif isinstance(data[0], pandas.DataFrame):
    # do it a third way, for VIS 2 analyzers on an Opus
else:  # list of list of Series
    # do it a fourth way, for VIS 1 analyzers on an Opus

Every function or method that calls get_data() would need to handle four return formats, and every IndexedPiece method would have to work with both Piece and Opus objects. This is a situation where the Pythonic solution seems much worse. In fact, it was difficult enough to support both "DataFrame" and "list of Series" in VIS 1 (which is why I tried pushing everything to DataFrame in VIS 2). Can we really expect that every call to get_data() will properly handle all four situations? If there's even one get_data() call that can't deal with opera (Opusses?) then you're definitely going to have a hard-to-diagnose bug in the future.

What I did instead. IndexedPiece doesn't work on Opus objects. When you import a file and it's an Opus, either you get a failure with OpusWarning, or if you're expecting it you get a bunch of new IndexedPiece instances, each with one Piece to worry about. So the baggage of dealing with Opus imports is consolidated to IndexedPiece._import_score() and the first call to get_data().

What I should have done instead. The "idiomatic Python" solution I gave above describes the usual behaviour of a single method/function, where it's okay to return a different data type depending on the input. We can actually still do this, and avoid the problems described above, by using the factory pattern:

# in indexed_piece.py

def IndexedPiece(filename):
    # factory function
    imported = _import_piece(filename)
    if isinstance(imported, stream.Opus):
        return AggregatedPieces(whatever_would_go_here)
    else:
        return _IndexedPiece(whatever_would_go_here)

class _IndexedPiece(object):
    # same as now
    pass

Now it looks like the IndexedPiece constructor returns objects of different types, but it doesn't. I didn't know any better when I implemented the current solution. If I had, we still wouldn't have been able to do it because importing every file at the same time would have used too much memory. I think music21 has improved to the point where that's not a concern any more, but you should test it if you're going to do this.

alexandermorgan commented 8 years ago

Thanks for this reply. Allow me to summarize to make sure we're on the same page for the most important details.

  1. A good solution is to make an agg_pieces object out of an opus
  2. A good implementation for this solution is to move score importation out of the IndexedPiece class

Your factory function above imports the file immediately. VIS does not currently do this but I think it's a good idea (though it will wreak havoc on the tests). Other than for testing purposes, why would you want to wait between the steps of creating an indexed_piece object and importing its file? I propose we call your factory function vis_import() instead of IndexedPiece() because it can return either an ind_piece or an agg_pieces object.

crantila commented 8 years ago
  1. A good solution is to make an agg_pieces object out of an opus

Yes, but make sure you can get the IndexedPiece objects from the AggregatedPieces. That's probably implemented already, but it's worth checking.

  1. A good implementation for this solution is to move score importation out of the IndexedPiece class

Yes, and to make it look the same as now. The whole point of a factory function is that you think you're just calling a regular constructor. If you call the new function vis_import() then all of a sudden people have to start thinking things like "oh, I can't actually write IndexedPiece() to make an IndexedPiece, but what was the name of that function? And why do I have to call that function? Why not just make the IndexedPiece directly?" And then you still have the problem of what to do when (not "if") someone makes an IndexedPiece and it imports as an Opus.

This seems like the time to ask for help from an actual software engineer, like @mrbannon.

alexandermorgan commented 8 years ago

Ok great, thanks. The reason I was thinking of calling this vis_import() is that it seems strange to use a method called IndexedPiece() to create an aggregated_pieces() object, which is what would happen if it's an opus.

alexandermorgan commented 8 years ago

An alternative solution would be that we make everything an aggregated_pieces object. I think that would be overkill though.

crantila commented 8 years ago

The advantage of the factory pattern is that it's a well-known, decades-old software design idea. Experienced developers will recognize it immediately and understand what's going on. (Or in the case of SIMSSA, student developers are exposed to and learn about the pattern).

Take a look at this book excerpt. The examples are in JavaScript, but you should be able to read it except for one thing: functions and objects are sort of the same, so that:

function Car( options ) {
  this.doors = options.doors || 4;
  this.state = options.state || "brand new";
  this.color = options.color || "silver";
}

... is the same as...

class Car(object):
    def __init__(self, options):
        self._options = {}
        self._options = options.get('doors', 4)
        # etc.
alexandermorgan commented 8 years ago

This was an interesting example, thanks. In the example though, a vehicle_factory() function is used to make car and truck objects. So for VIS it seems like the equivalent would be a piece_factory(), or maybe a piece_importer() is a more self-explanatory name.

crantila commented 8 years ago

Not quite like that... they use a VehicleFactory() function... which is as close to a VehicleFactory() class as you can get in JavaScript. We could call the function PieceFactory(), that's not a bad idea, but the capitalization like a class name is important, to make it look like you're calling a constructor.

alexandermorgan commented 8 years ago

Oh ok, I see what you mean about the capitalization, I had missed that detail. Thanks.