ELVIS-Project / vis-framework

Thoroughly modern symbolic musical data analysis suite.
http://elvisproject.ca/
31 stars 6 forks source link

What should go in VIS 3.0? #387

Closed alexandermorgan closed 7 years ago

alexandermorgan commented 8 years ago

Here is what I want to see in VIS 3.0 (in no particular order, the numbers are just to make the points easier to reference):

Anything else?

alexandermorgan commented 8 years ago

After having worked on this for little while now I think item 5 may not be a good idea. Instead of saving certain analysis results to the database, we can just save them as attributes of that indexed piece. We'll have to recalculate them every time a piece is reloaded, but that's not so bad and it avoids a lot of messy calls to the database.

musicus commented 8 years ago

@alexandermorgan we still need to have some type of database interface (unless maybe that should be an add-on module) to save search results. The larger the corpus the more analytical data we are going to generate, and it would be good to be able to have mechanisms in place that can handle communications to a database (sql, or no-sql systems) that can handle very large amounts of data.

alexandermorgan commented 8 years ago

@musicus good point, I agree. Your comment addresses point 10 above in a more in-depth way.

crantila commented 8 years ago

Storing intermediate results in a remote database (not necessarily the database) has been on the table since before VIS was a Framework. It was the overriding concern that encouraged adopting the (backward incompatible) return format shared by all the current analyzers, and therefore the reason a 2.x release series was created. I notice you've been trying to loosen the reigns on the data passed between analyzers, which is your decision, but just FYI.

As for where this functionality belongs, the decision seems pretty clear. You can't argue that the pre-existing LilyPond analyzers, which are proper analyzers with no external dependencies, should be removed because they're not lightweight, at the same time as saying that a database interface only "maybe... should be an add-on module." From what I can tell, data management is now handled by RODAN, so if you're getting your VIS input data from a remote database it should be done within a RODAN module/step/whatever.

alexandermorgan commented 8 years ago

In the list above, point 4 is a bigger priority than point 5. I didn't rank them or anything so you couldn't have known that. @crantila, your comment is more about point 5. If we do store data permanently on some database, my original idea was to just store pickled versions of the parts and the metadata. This is all almost everything in VIS ever needs and this could save on import time which is a bottleneck in VIS. If point 5 ever happens, that will depend more on a database person than on VIS programmers. About point 4, this kind of temporary caching has been in VIS for some time already with the noterest indexer results. They are stored as an attribute of an indexed piece when the analysis is done with the get method. This is a convenient analysis result to store because the noterest indexer takes no settings, and it is sort of slow because it uses stream indexing. When this get method was implemented (by Christopher I imagine) I believe there were no other stream indexing indexers that took no settings. Now there are five with plans for a few more so the use of get methods is becoming even more important. Also, @mrbannon and I have more or less come up with a way of storing one proto-version of vertical and horizontal interval results that could be stored and then quickly transformed into any desired output with the user's settings.

crantila commented 8 years ago

Okay, but please please do not use pickling for this. It may be a very fast serialization, but it's difficult to handle versioning, and moreover you're opening two very large remote execution holes on your servers (and possibly your own computer).

crantila commented 8 years ago

I would also like to suggest that most of the test suite should be rewritten.

I tried my best, but what we have now is an incomprehensible mess that hinders development and isn't properly adjusted for new functionality. I've learned a lot about testing software since then; I would like to point out the VIS test suite as an example of how not to do that.

musicus commented 8 years ago

@crantila definitely no pickling (I only use it on a local machines, never a server).

alexandermorgan commented 8 years ago

Ok, so we agree to scrap number 5 in the list above. It was just meant as a performance enhancement anyway and we've got more important stuff to work on than performance.

crantila commented 8 years ago

We can still do point 5 above, but with a different serialization format. Pandas offers several choices for this, including SQL (and what better for a database than SQL?) Plus that would be a step up toward the indexers-in-SQL idea mentioned in #375.

musicus commented 8 years ago

Either SQL, or json are fine with me, with preference to json.

alexandermorgan commented 8 years ago

It takes longer to revive json strings of parts than it does just to import a piece from scratch, so I don't think json will help us out. SQL or json might work if we only keep the dataframes of results. This would quickly become a lot of files for each piece. The original idea was to just store the parts, but this is hard since they are composed of music21 objects, not just strings or ints.

musicus commented 8 years ago

Yes, those could become a lot of files. music21 hypothetically does have the functionality to store its objects in json, but did it actually work @alexandermorgan? I think we should just keep the DataFrames themselves, and store them in json, which seems to be the simplest solution.

alexandermorgan commented 8 years ago

DEADLINE: if you have new stuff you want to see in VIS 3.0 add it to this issue by noon on Monday May 16. Then we'll refine and prioritize our collective list. Add the new ideas to the list in the first message of this issue by editing the issue. Thanks. FYI: @mrbannon @mborsodi @crantila @musicus @maxalbert

crantila commented 8 years ago

I added the point about testing, and also one about deprecating the WorkflowManager. The WorkflowManager was designed as, you know, the thing that manages workflows, which has since been overtaken by RODAN (for server applications). For the handful of desktop users, the extra baggage adds complications and is difficult to extend. I think we would be better off developing either a set of example starter scripts and/or a set of functions to help with common desktop analysis tasks.

mrbannon commented 8 years ago

Here's what I would like to see:

alexandermorgan commented 8 years ago

@mrbannon could you be more specific about your first four points? What's an example of the problem for each one? I think the datatypes are well-defined in VIS, especially between the indexers. Do you want checks for the origin of dataframes passed as arguments? What do you mean by any type of storage? What kind of stuff do you want to store? Also is VIS the best place for that? We can temporarily cache attributes of indexed_piece or aggregated_pieces objects, but this should only be for a limited number of specific situations. I'm almost done with this on my branch. About the minimization of settings, it is unavoidable that certain settings will make certain workflows logically invalid. I just addressed this on my branch for the dissonance indexer though, which can now be called with a get method that requires no settings and generates the input analyses it needs automatically. I left the actual dissonance indexer as is, because you seem to want to be able to pass it unverified arguments. I still don't understand why you want this, but it's probably best to keep the two ways of getting dissonance indexer results separate anyway.

mrbannon commented 8 years ago

I will address these issues in our meeting.

On Sun, May 15, 2016 at 1:07 PM, Alexander Morgan notifications@github.com wrote:

@mrbannon https://github.com/mrbannon could you be more specific about your first four points? What's an example of the problem for each one? I think the datatypes are well-defined in VIS, especially between the indexers. Do you want checks for the origin of dataframes passed as arguments? What do you mean by any type of storage? What kind of stuff do you want to store? Also is VIS the best place for that? We can temporarily cache attributes of indexed_piece or aggregated_pieces objects, but this should only be for a limited number of specific situations. I'm almost done with this on my branch. About the minimization of settings, it is unavoidable that certain settings will make certain workflows logically invalid. I just addressed this on my branch for the dissonance indexer though, which can now be called with a get method that requires no settings and generates the input analyses it needs automatically. I left the actual dissonance indexer as is, because you seem to want to be able to pass it unverified arguments. I still don't understand why you want this, but it's probably best to keep the two ways of getting dissonance indexer results separate anyway.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/ELVIS-Project/vis-framework/issues/387#issuecomment-219297774

musicus commented 8 years ago
maxalbert commented 8 years ago

@mrbannon's points all resonate strongly with me. Sadly, my work with VIS has been put on hold for quite a while so I don't have anything immediate to add, but in my (admittedly very limited) experience of working with VIS one of the things I found most difficult was to feed output from one indexer into another and easily build multi-step pipelines. I don't have any concrete examples of my difficulties at hand, I'm afraid, but anything along the lines of what @mrbannon wrote seems a good step into the right direction to make this easier. I should note, though, that it is quite possible that my difficulties were due to my infamiliarity with VIS rather than its design.

crantila commented 8 years ago

Thanks for your feedback @maxalbert, and may I also suggest that inadequate documentation is part of the problem?

maxalbert commented 8 years ago

@crantila Yes, absolutely. Missing and/or outdated documentation was one of the biggest problems to get me going. I'm sure that VIS is actually capable of a lot of things that I'm not aware of and thus had to find workarounds for.

musicus commented 8 years ago

A lyrics indexer for music with text.

alexandermorgan commented 7 years ago

new_ngram.py has become ngram.py (and the old ngram.py was removed) as of this commit on alex_devel: f5738d9f8304f47a600289f0c559d5f72d7733df