ContextLab / hypertools

A Python toolbox for gaining geometric insights into high-dimensional data
http://hypertools.readthedocs.io/en/latest/
MIT License
1.83k stars 160 forks source link

save raw data in geo instead of formatted data #189

Closed andrewheusser closed 6 years ago

andrewheusser commented 6 years ago

In implementing the text features, I ran into a bug where I could not change any of the text parameters when replotting a geo. After a battle, I realized that the reason is because the geo.data field contains the formatted data, not the raw input data. The problem lies in that we implemented the text transformations within the format_data function. So, no text is saved in the geo, and we can't replot it after changing parameters, like the corpus. I see two possible solutions:

  1. Instead of saving the formatted data on the geo, we save the raw input data. This gets a little tricky when writing the geo to disc, but may be a good solution, although it changes the structure of a geo

  2. Move the text transformations outside (and after) the format_data function. We could format the numerical data but leave text dat untouched.

I'm not sure which is best. @jeremymanning could i get your input?

andrewheusser commented 6 years ago

we decided to save the raw (unformatted) data in the geo, instead of the formatted data