EHWUSF / HS68_2018_Project_1

0 stars 9 forks source link

Data visualization:module_aggregate_plot #6

Open nitieaj opened 6 years ago

nitieaj commented 6 years ago

The purpose of this module would be to summarize all columns according to specified function of interest with respect to a class label .

Function returns output of specified function e.g fivenum,summary for all the columns grouped by the status label and adds corresponding plots associated with the columns e.g bar plot.

It should aid in visualizing the summarized results and the plot side by side

rohitchadaram commented 6 years ago

I like the idea, but I would suggest to think of a way this can be used as real tool for linear regressions or any other model, I mean how could/would you take the output for this module and use them ? Any thoughts ?

hhan14 commented 6 years ago

I think your idea is useful and handy in that, as you mentioned, if all columns of dataset is summarized in groups with feature labels and corresponding plots. I just want to advance it by adding the function returning as function can be set not as default so that users can choose.

nitieaj commented 6 years ago

The output of this module gives you a visual summary near the associated verbose summary.

nitieaj commented 6 years ago

Before you do regression usually ,you may want to visualize some stats like the distribution of the predictors(exposures), the range of values etc .I find sometimes its easier to have the visual summary together with the verbose summary/structure of the data.

RoxanneXin commented 6 years ago

Really like your idea. For large data frame, some summarize plots will be necessary. It also can be useful for data pre processing before any model using, like exploring data distribution, roughly finding outliers or bad data.