Give a summary table of variables for pandas DataFrame

Dear @founderfan,

Thanks very much and congratulations for submitting our very first community code contribution to hypertools!

We could see something like this being useful as an analog to the pandas describe function, but that works on all of the input types supported by hypertools.

For example, one path forward would be to have a hypertools.tools.describe function that works as follows:

If data is a pandas dataframe, simply call data.describe() return the result
If data is a single numpy array, convert data to a pandas.DataFrame object and then call data.describe() and return the result
If data is a list of dataframes (and/or arrays), convert each list element to a dataframe if needed, and then call data[i].describe() for each item i and return the list of results. This also requires parsing any arguments to describe (the pandas describe function accepts percentile, include, and exclude as optional arguments, which all expect list-like (or array-like) values). For example, if the user passes in a single array/list as any of these arguments, then those values should be used to process each data element's describe request (i.e. using the same arguments). But if the user passes in a list of lists/arrays, then we need to verify that each list is of the same length as data (if not, throw an error), and then pass potentially different arguments into describe for each data list element. (E.g. this should work similarly to how data/arguments are processed by hyp.plot.)

Another consideration is that, whereas the pandas describe function outputs a single dataframe that is formatted nicely when printed to the python command window, it is less obvious to us how to nicely format a list of "describe" dataframes. We would want to solve this formatting issue (in addition to the input issues descibed above) before incorporating a function like this into our codebase.

Another change we would need in order to merge the pull request is that all hypertools functions need to have a pytest function in the tests/ folder, so that we can know if functionality breaks. We recommend writing tests for a single dataframe, a single numpy array, and a mixed list of several dataframes and numpy arrays.

Finally, we would need to add documentation to your function so that the hypertools API remains fully documented (e.g. see our current documentation).

We are closing this pull request for now, but we hope that you will consider making the above changes and re-submitting!

Best, The Contextual Dynamics Lab

ContextLab / hypertools

Give a summary table of variables for pandas DataFrame #93