What should go in VIS 3.0?

alexandermorgan commented 8 years ago

Here is what I want to see in VIS 3.0 (in no particular order, the numbers are just to make the points easier to reference):

[x] 1. breakdown of score importation into more steps
[x] 2. no more "stream indexers", instead index the streams of each part in a score only once when the data is imported and save the results as a list of series (important, a list of series, not a dataframe), then all indexers that are currently stream indexers would be series indexers instead
[x] 3. use .applymap() (for dataframes) or .apply() (for series) to execute all indexer functions
[x] 4. cache the results of all indexers that require no settings and the results of some intermediary analysis steps as attributes of each indexed_piece
[x] 5. ~~save these cached attributes on the database along with the corresponding symbolic notation files~~
[x] 6. use get methods to run indexers that don't take settings
[x] 7. stop using the noterest_indexer results to calculate intervals; instead cache the intermediary analysis step of getting all the music21 note and rest objects and use these to directly create intervals. This will allow us to avoid creating hundreds of thousands of notes and intervals per piece that will just get discarded immediately anyway
[x] 8. ~~create a dynamic offset indexer that automatically detects what the ideal sampling rate is for Renaissance music~~ Put off until VIS 3.1.
[x] 9. replace ngram indexer with new_ngram indexer; the name will be the ngram indexer
[x] 10. ~~have a direct link from the database to the analysis web-app~~
[x] 11. an easy way to run the same analysis on a group of pieces
[x] 12. refactor the test suite so it's useful for what it's supposed to do
[x] 13. deprecate the WorkflowManager for VIS 4 and decide a suitable replacement
[x] 14. ~~some sort of chordal analysis indexer, see #258~~

Anything else?

alexandermorgan commented 8 years ago

After having worked on this for little while now I think item 5 may not be a good idea. Instead of saving certain analysis results to the database, we can just save them as attributes of that indexed piece. We'll have to recalculate them every time a piece is reloaded, but that's not so bad and it avoids a lot of messy calls to the database.

musicus commented 8 years ago

@alexandermorgan we still need to have some type of database interface (unless maybe that should be an add-on module) to save search results. The larger the corpus the more analytical data we are going to generate, and it would be good to be able to have mechanisms in place that can handle communications to a database (sql, or no-sql systems) that can handle very large amounts of data.

alexandermorgan commented 8 years ago

@musicus good point, I agree. Your comment addresses point 10 above in a more in-depth way.

crantila commented 8 years ago

Storing intermediate results in a remote database (not necessarily the database) has been on the table since before VIS was a Framework. It was the overriding concern that encouraged adopting the (backward incompatible) return format shared by all the current analyzers, and therefore the reason a 2.x release series was created. I notice you've been trying to loosen the reigns on the data passed between analyzers, which is your decision, but just FYI.

As for where this functionality belongs, the decision seems pretty clear. You can't argue that the pre-existing LilyPond analyzers, which are proper analyzers with no external dependencies, should be removed because they're not lightweight, at the same time as saying that a database interface only "maybe... should be an add-on module." From what I can tell, data management is now handled by RODAN, so if you're getting your VIS input data from a remote database it should be done within a RODAN module/step/whatever.

alexandermorgan commented 8 years ago

In the list above, point 4 is a bigger priority than point 5. I didn't rank them or anything so you couldn't have known that. @crantila, your comment is more about point 5. If we do store data permanently on some database, my original idea was to just store pickled versions of the parts and the metadata. This is all almost everything in VIS ever needs and this could save on import time which is a bottleneck in VIS. If point 5 ever happens, that will depend more on a database person than on VIS programmers. About point 4, this kind of temporary caching has been in VIS for some time already with the noterest indexer results. They are stored as an attribute of an indexed piece when the analysis is done with the get method. This is a convenient analysis result to store because the noterest indexer takes no settings, and it is sort of slow because it uses stream indexing. When this get method was implemented (by Christopher I imagine) I believe there were no other stream indexing indexers that took no settings. Now there are five with plans for a few more so the use of get methods is becoming even more important. Also, @mrbannon and I have more or less come up with a way of storing one proto-version of vertical and horizontal interval results that could be stored and then quickly transformed into any desired output with the user's settings.

crantila commented 8 years ago

Okay, but please please do not use pickling for this. It may be a very fast serialization, but it's difficult to handle versioning, and moreover you're opening two very large remote execution holes on your servers (and possibly your own computer).

crantila commented 8 years ago

I would also like to suggest that most of the test suite should be rewritten.

I tried my best, but what we have now is an incomprehensible mess that hinders development and isn't properly adjusted for new functionality. I've learned a lot about testing software since then; I would like to point out the VIS test suite as an example of how not to do that.

musicus commented 8 years ago

@crantila definitely no pickling (I only use it on a local machines, never a server).

alexandermorgan commented 8 years ago

Ok, so we agree to scrap number 5 in the list above. It was just meant as a performance enhancement anyway and we've got more important stuff to work on than performance.

crantila commented 8 years ago

We can still do point 5 above, but with a different serialization format. Pandas offers several choices for this, including SQL (and what better for a database than SQL?) Plus that would be a step up toward the indexers-in-SQL idea mentioned in #375.

musicus commented 8 years ago

Either SQL, or json are fine with me, with preference to json.

alexandermorgan commented 8 years ago

It takes longer to revive json strings of parts than it does just to import a piece from scratch, so I don't think json will help us out. SQL or json might work if we only keep the dataframes of results. This would quickly become a lot of files for each piece. The original idea was to just store the parts, but this is hard since they are composed of music21 objects, not just strings or ints.

musicus commented 8 years ago

Yes, those could become a lot of files. music21 hypothetically does have the functionality to store its objects in json, but did it actually work @alexandermorgan? I think we should just keep the DataFrames themselves, and store them in json, which seems to be the simplest solution.

alexandermorgan commented 8 years ago

DEADLINE: if you have new stuff you want to see in VIS 3.0 add it to this issue by noon on Monday May 16. Then we'll refine and prioritize our collective list. Add the new ideas to the list in the first message of this issue by editing the issue. Thanks. FYI: @mrbannon @mborsodi @crantila @musicus @maxalbert

crantila commented 8 years ago

I added the point about testing, and also one about deprecating the WorkflowManager. The WorkflowManager was designed as, you know, the thing that manages workflows, which has since been overtaken by RODAN (for server applications). For the handful of desktop users, the extra baggage adds complications and is difficult to extend. I think we would be better off developing either a set of example starter scripts and/or a set of functions to help with common desktop analysis tasks.

mrbannon commented 8 years ago

Here's what I would like to see:

Well-defined datatypes that pass between indexers
caching solution that allows any type of storage; this implies some type of API
minimization of settings such that changing a setting in an indexer does not impede the use of another indexer
"minimal and complete": indexers do one thing and one thing only; it's up to the user to decide how they are used
make the indexers the core package, thoroughly tested
is there a way to define the temp directory that music21 uses, and can this be set in VIS?
change the name to just VIS

alexandermorgan commented 8 years ago

@mrbannon could you be more specific about your first four points? What's an example of the problem for each one? I think the datatypes are well-defined in VIS, especially between the indexers. Do you want checks for the origin of dataframes passed as arguments? What do you mean by any type of storage? What kind of stuff do you want to store? Also is VIS the best place for that? We can temporarily cache attributes of indexed_piece or aggregated_pieces objects, but this should only be for a limited number of specific situations. I'm almost done with this on my branch. About the minimization of settings, it is unavoidable that certain settings will make certain workflows logically invalid. I just addressed this on my branch for the dissonance indexer though, which can now be called with a get method that requires no settings and generates the input analyses it needs automatically. I left the actual dissonance indexer as is, because you seem to want to be able to pass it unverified arguments. I still don't understand why you want this, but it's probably best to keep the two ways of getting dissonance indexer results separate anyway.

mrbannon commented 8 years ago

I will address these issues in our meeting.

On Sun, May 15, 2016 at 1:07 PM, Alexander Morgan notifications@github.com wrote:

@mrbannon https://github.com/mrbannon could you be more specific about your first four points? What's an example of the problem for each one? I think the datatypes are well-defined in VIS, especially between the indexers. Do you want checks for the origin of dataframes passed as arguments? What do you mean by any type of storage? What kind of stuff do you want to store? Also is VIS the best place for that? We can temporarily cache attributes of indexed_piece or aggregated_pieces objects, but this should only be for a limited number of specific situations. I'm almost done with this on my branch. About the minimization of settings, it is unavoidable that certain settings will make certain workflows logically invalid. I just addressed this on my branch for the dissonance indexer though, which can now be called with a get method that requires no settings and generates the input analyses it needs automatically. I left the actual dissonance indexer as is, because you seem to want to be able to pass it unverified arguments. I still don't understand why you want this, but it's probably best to keep the two ways of getting dissonance indexer results separate anyway.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/ELVIS-Project/vis-framework/issues/387#issuecomment-219297774

musicus commented 8 years ago

Clear separation of settings from indexers: Settings should be individual modules, or filters (this goes along with @mrbannon's 4^th point).

maxalbert commented 8 years ago

@mrbannon's points all resonate strongly with me. Sadly, my work with VIS has been put on hold for quite a while so I don't have anything immediate to add, but in my (admittedly very limited) experience of working with VIS one of the things I found most difficult was to feed output from one indexer into another and easily build multi-step pipelines. I don't have any concrete examples of my difficulties at hand, I'm afraid, but anything along the lines of what @mrbannon wrote seems a good step into the right direction to make this easier. I should note, though, that it is quite possible that my difficulties were due to my infamiliarity with VIS rather than its design.

crantila commented 8 years ago

Thanks for your feedback @maxalbert, and may I also suggest that inadequate documentation is part of the problem?

maxalbert commented 8 years ago

@crantila Yes, absolutely. Missing and/or outdated documentation was one of the biggest problems to get me going. I'm sure that VIS is actually capable of a lot of things that I'm not aware of and thus had to find workarounds for.

musicus commented 8 years ago

A lyrics indexer for music with text.

alexandermorgan commented 7 years ago

new_ngram.py has become ngram.py (and the old ngram.py was removed) as of this commit on alex_devel: f5738d9f8304f47a600289f0c559d5f72d7733df

ELVIS-Project / vis-framework

What should go in VIS 3.0? #387