ContextLab / hypertools

A Python toolbox for gaining geometric insights into high-dimensional data
http://hypertools.readthedocs.io/en/latest/
MIT License
1.83k stars 160 forks source link

enhancements and fixes to `DataGeometry` objects and argument parsing #164

Open jeremymanning opened 7 years ago

jeremymanning commented 7 years ago

DataGeometry instances provide a way to apply the same transformations to new data, or re-apply transformations to old data. There are several issues with the current implementation. I've outlined what I think are the biggest issues below (these should probably be divided into separate issues).

Exposing all hypertools functions from DataGeometry instances

In addition to plot and transform, I'd like to see reduce, normalize, and align exposed in DataGeometry instances. In other words, geo.plot(...) should either re-plot old data or plot new data, filling in all arguments with the previously (or newly) specified ones. Similarly, geo.reduce(...) should apply dimensionality reduction to new (or old) data, filling in arguments as appropriate. And same with geo.normalize(...) and geo.align(...). These functions should all behave essentially like hyp.plot, hyp.reduce, hyp.normalize, and hyp.align but with the ability to re-use already specified arguments and data. The existing DataGeometry.transform function provides an additional convenience-- a mechanism for easily re-applying the full pipeline of reduce/normalize/align transformations to new (or old) data.

Argument parsing in DataGeometry plot, reduce, normalize, align, and transform functions

In the existing implementation, there are several inconsistencies with how arguments are parsed between hypertools.plot and DataGeometry.plot. For example, DataGeometry.plot does not accept format strings, whereas hypertools.plot does.

I propose that we change the implementation by writing parse_arguments helper functions (these won't be exposed to the user-- they should be private functions) for plot, reduce, normalize, and align. Each function should return a dictionary of parsed arguments (with defaults filled in) appropriate to the given function. Both the hypertools and DataGeometry functions should parse arguments in the exact same way (by calling these helper functions). The difference is that for the DataGeometry versions of those functions, the "defaults" should be replaced by any previously specified arguments. In other words we should to something like the following: