Closed founderfan closed 7 years ago
Dear @founderfan,
Thanks very much and congratulations for submitting our very first community code contribution to hypertools!
We could see something like this being useful as an analog to the pandas describe
function, but that works on all of the input types supported by hypertools.
For example, one path forward would be to have a hypertools.tools.describe
function that works as follows:
data
is a pandas dataframe, simply call data.describe()
return the resultdata
is a single numpy array, convert data
to a pandas.DataFrame object and then call data.describe()
and return the resultdata
is a list of dataframes (and/or arrays), convert each list element to a dataframe if needed, and then call data[i].describe()
for each item i
and return the list of results. This also requires parsing any arguments to describe
(the pandas describe function accepts percentile
, include
, and exclude
as optional arguments, which all expect list-like (or array-like) values). For example, if the user passes in a single array/list as any of these arguments, then those values should be used to process each data element's describe request (i.e. using the same arguments). But if the user passes in a list of lists/arrays, then we need to verify that each list is of the same length as data
(if not, throw an error), and then pass potentially different arguments into describe
for each data list element. (E.g. this should work similarly to how data/arguments are processed by hyp.plot
.)Another consideration is that, whereas the pandas describe
function outputs a single dataframe that is formatted nicely when printed to the python command window, it is less obvious to us how to nicely format a list of "describe" dataframes. We would want to solve this formatting issue (in addition to the input issues descibed above) before incorporating a function like this into our codebase.
Another change we would need in order to merge the pull request is that all hypertools functions need to have a pytest function in the tests/ folder, so that we can know if functionality breaks. We recommend writing tests for a single dataframe, a single numpy array, and a mixed list of several dataframes and numpy arrays.
Finally, we would need to add documentation to your function so that the hypertools API remains fully documented (e.g. see our current documentation).
We are closing this pull request for now, but we hope that you will consider making the above changes and re-submitting!
Best, The Contextual Dynamics Lab
give a summary table of DataFrame about variable dtype, non-missing values, numeric statistics, 5 most frequent values and their frequency, the frequency of tails and missing values.