ELVIS-Project / vis-framework

Thoroughly modern symbolic musical data analysis suite.
http://elvisproject.ca/
31 stars 6 forks source link

Think about How to Export Settings #316

Closed crantila closed 8 years ago

crantila commented 10 years ago

Team effort! If we're going to export settings in a way that breaks backward compatibility, we should do it now... so we don't have to force VIS 3 sooner than needed.

crantila commented 10 years ago

There's no clean way to do this with a DataFrame (or any single pandas object) by itself. In particular, if we want to export settings to the popular-but-terrible formats (CSV and Excel), the settings won't be (reliably) re-importable.

However! When we start to export to and import from HDF5 files, we will certainly be able to store enough information because every HDF5 file allows storing many pandas objects. HDF5 files aren't widely supported by end-user software (I think), so it'll probably be the case that our end users will have to use CSV/Excel/... with silly modifications in their files. Fortunately, for VIS-specific results caching, we will certainly be able to find a way to keep track of the settings with which a set of data was produced. Because we can keep our HDF files in a known format (i.e., storing settings and their results in a standardized way), our results will be importable by users with capable programs.

Refer to http://pandas.pydata.org/pandas-docs/version/0.13.1/io.html#hdf5-pytables for more information.

crantila commented 10 years ago

So, because this is related to the database-and-VIS interconnection (since intermediate results will/can be stored in the EDDA in HDF files), I'm reassigning this issue to that milestone.

Is this explained clearly, @alexandermorgan ?

alexandermorgan commented 9 years ago

I was thinking that we could have a dataframe that contains the settings for all of the other results that are stored in the other dataframes. So if five experiments have been run that each instantiate their own dataframes, there would be six dataframes in all. The extra one's columns would take the name of the dataframe (the experiment) to which it refers. The first row of each of these columns could be a dictionary with the name of each setting as the keys and what they were set to as the values. I think this would be a sustainable solution that would allow us to run the same experiment with different settings and save all the results in an orderly way.

crantila commented 9 years ago

Thanks for your idea---this seems like one of the best ways to do it. I think I would prefer a similar approach, but with the settings stored in nested dictionaries, because it means we could export fewer files. I may change my mind depending on the difficulty of storing such a dictionary in an HDF5 file.

alexandermorgan commented 8 years ago

It looks like we're not going to export settings for VIS 3.0. Instead we're going to cache stable results that don't get influenced by settings. Post-processing indexers (like the n-gram indexer) will just get rerun every single time. There are simply too many possibilities to store them all in an orderly way. The good news is that most of these post-processing indexers are very fast to the point where they can probably be executed about as fast as users can choose the settings they want. One intermediary case is interval indexing. In VIS 3.0 we will store the compound versions of intervals with quality and direction and with horiz_attach_later set to True for horizontal intervals. This type of analyses allow for zero information loss and can then be quickly reindexed with the IntervalReindexer to get the results to whatever format the user wants.