Improve default plot order and legend placement

carloscinelli / benford.analysis

Tools that make it easier to use Benford’s law for data validation and forensic analytics.

61 stars 15 forks source link

Improve default plot order and legend placement #32

Open carloscinelli opened 5 years ago

carloscinelli commented 5 years ago

It might not be clear to users to what the chi-squared difference refers to. Maybe put both next to each other, and improve the description of the plot.

Also, think about a better default legend placement.

carloscinelli commented 5 years ago

@rafaelslins I think the changes look good!

Some thoughts:

[x] we can set the default except = NULL since select will be the main argument now. Also, we can put select before except in the arguments of the function.
[x] we still need to think about the legend placement. I don't think the legend as a separate plot on its own is working well. Maybe we can create an extra margin on the bottom of all plots and put the legend on the bottom. Any ideas?
[x] we should probably rename error bars to "lower and upper bounds".
[x] Also, regarding the bounds, maybe two line lines will work better than the arrows? We should test to see how it looks.

carloscinelli commented 5 years ago

@rafaelslins some problems in the current implementation

[x] the parameter alpha is not being dispatched to the functions

carloscinelli commented 5 years ago

I'm inclined to think this looks cleaner for the bounds

Versus the current version

carloscinelli commented 5 years ago

For the legend, we could aim for something like this: https://stackoverflow.com/questions/10389967/common-legend-for-multiple-plots-in-r

carloscinelli commented 5 years ago

@rafaelslins Rafael, some bugs and other problems were introduced in this PR. I'm going to list some of them.

[ ] The argument multiple = F is not working. Also, if the user passes only one plot in select, the default behavior should be the same as multiple = F.
[x] It doesn't make sense to put in the default display the second order difference plot as a huge plot in the middle. All plots should be the same height.
[ ] legend size is not working well.

Other comments:

the function was accepting invalid plot names, I fixed that temporarily in a new commit.
I have changed the legend description to more informative names.

Let's do these changes in a branch, and only merge when things are well tested and working correctly.

carloscinelli commented 5 years ago

Argument multiple = F still not working,

Example: data("census.2009") cs <- benford(census.2009$pop.2009[census.2009$pop.2009>10]) plot(cs, select = "digits", multiple = F)

rafaelslins commented 5 years ago

I think part of the problem with multiple = F is due to the incompatibility of thelayout() function (mainly used to allocate subtitles) and par() function.

?layout: "These functions are totally incompatible with the other mechanisms for arranging plots on a device: par(mfrow), par(mfcol) and split.screen."

rafaelslins commented 5 years ago

I'm thinking of a solution using just the pair () function

rafaelslins commented 5 years ago

I'm trying to make it possible:

par(mfrow=c(2,1)) bfd.cp <- benford(corporate.payment$Amount) plot(bfd.cp, select = "digits", multiple = F) plot(bfd.cp, select = "chi squared", multiple = F)

rafaelslins commented 5 years ago

I have had a lot of fails to try it work:

par(mfrow=c(2,1)) plot(bfd.cp, select="digits", multiple=F) #plots plot(bfd.cp, select="rootogram digits", multiple=F) #plots

I implement a simple (temporary) solution that returns the desired result:

plot(bfd.cp, select=c("digits", "rootogram digits"), multiple=T, mfrow = c(2,1))

carloscinelli commented 5 years ago

@rafaelslins you can plot the legends of single plots inside the plot itself, then this should not be a problem.

But for now focus on having all individual plot functions implemented and working correctly as autonomous individual functions that are easy to use and customize. These functions should be easy to use by themselves without resorting to the generic plot.