ContextLab / hypertools

A Python toolbox for gaining geometric insights into high-dimensional data
http://hypertools.readthedocs.io/en/latest/
MIT License
1.83k stars 160 forks source link

Give a summary table of variables for pandas DataFrame #93

Closed founderfan closed 7 years ago

founderfan commented 7 years ago

give a summary table of DataFrame about variable dtype, non-missing values, numeric statistics, 5 most frequent values and their frequency, the frequency of tails and missing values.

jeremymanning commented 7 years ago

Dear @founderfan,

Thanks very much and congratulations for submitting our very first community code contribution to hypertools!

We could see something like this being useful as an analog to the pandas describe function, but that works on all of the input types supported by hypertools.

For example, one path forward would be to have a hypertools.tools.describe function that works as follows:

Another consideration is that, whereas the pandas describe function outputs a single dataframe that is formatted nicely when printed to the python command window, it is less obvious to us how to nicely format a list of "describe" dataframes. We would want to solve this formatting issue (in addition to the input issues descibed above) before incorporating a function like this into our codebase.

Another change we would need in order to merge the pull request is that all hypertools functions need to have a pytest function in the tests/ folder, so that we can know if functionality breaks. We recommend writing tests for a single dataframe, a single numpy array, and a mixed list of several dataframes and numpy arrays.

Finally, we would need to add documentation to your function so that the hypertools API remains fully documented (e.g. see our current documentation).

We are closing this pull request for now, but we hope that you will consider making the above changes and re-submitting!

Best, The Contextual Dynamics Lab