Closed mnlang closed 2 years ago
Thanks, Moritz. (As I had already indicated in the Poisson PR I was about to provide a fix for that in a separate PR.)
The problem is that substitute(x)
is not enough as this provides a "call"
here. The usual fix would be to use deparse(substitute(x))
. In addition the &&
can be added as you proposed. Finally, letting the user choose a variable_name
explicitly would also be helpful in some cases.
Regarding the type = c("pdf", "cdf")
argument you suggest in the comment above: I would recommend to use a different argument name for that. It would be nice to export the type = NULL
argument and then switch to type = "p"
etc. as appropriate - but then the user can also set, for example, type = "o"
or type = "h"
. This is what I would have liked to do in the poisson
vignette.
Separate but related question (possibly better for a separate issue): The way the plot()
method determines whether a distribution is discrete or not is quite ugly and not extensible. Should this be modularized in some way? There is already the support()
extractor but it only returns the minimum and maximum for a given distribution. Maybe adding some sort of extractor for the "minimum increment" would be helpful? This would be 0 for continuous distributions and 1 for most discrete distributions.
More generally, would you like me to write an issue or even a PR to unify the plot functions? Currently there is a plot.distributions3() with the boolean argument cdf, as well as the functions plot_pdf() and plot_cdf() based on ggplot2:
Ah yes we do seem to have a lot of plotting functions floating around. Instead of a single plot
generic with type
argument, I do prefer plot_pdf()
and plot_cdf()
from a naming perspective, and then to either get rid of plot()
and autoplot()
or to alias them to them to plot_pdf()
. Thoughts?
In a further step it might be even possible to provide an interface to the package ggdist?
I'm intrigued, what are you thinking here?
The way the plot() method determines whether a distribution is discrete or not is quite ugly and not extensible.
I'm open to changing this; basically the current way we represent a distribution is a quickly thrown together hack. I started thinking about a more careful class system a while back but didn't really have the capacity to push it anywhere. As a start it seems like subclasses for discrete and for continuous distributions would be an easy win, and I would welcome a PR to make those changes.
The problem is that substitute(x) is not enough as this provides a "call" here. The usual fix would be to use deparse(substitute(x)). In addition the && can be added as you proposed. Finally, letting the user choose a variable_name explicitly would also be helpful in some cases.
My bad. I have adjusted the code accordingly!
Ah yes we do seem to have a lot of plotting functions floating around. Instead of a single plot generic with type argument, I do prefer plot_pdf() and plot_cdf() from a naming perspective, and then to either get rid of plot() and autoplot() or to alias them to them to plot_pdf(). Thoughts?
I understand your point, on the other hand plot()
and autoplot()
distinguish more clearly between base
and ggplot2
graphics. Aliasing might really be a good compromise by first checking if ggplot2
is loaded and rendering the plot accordingly?
But thinking about it again, wouldn't that imply that the arguments are 1:1 interchangeable - which will not be really the case.
Hence, I think we should rather work with plot
and autoplot
and put parts of the autoplot
function into individual exported geoms
(as it's already partly done).
What do you think, @alexpghayes and @zeileis?
Ah yes we do seem to have a lot of plotting functions floating around. Instead of a single plot generic with type argument, I do prefer plot_pdf() and plot_cdf() from a naming perspective, and then to either get rid of plot() and autoplot() or to alias them to them to plot_pdf(). Thoughts?
I understand your point, on the other hand
plot()
andautoplot()
distinguish more clearly betweenbase
andggplot2
graphics. Aliasing might really be a good compromise by first checking ifggplot2
is loaded and rendering the plot accordingly?But thinking about it again, wouldn't that imply that the arguments are 1:1 interchangeable - which will not be really the case. Hence, I think we should rather work with
plot
andautoplot
and put parts of theautoplot
function into individual exportedgeoms
(as it's already partly done).What do you think, @alexpghayes and @zeileis?
When I first wrote the plot_cdf
and plot_pdf
functions, I simply needed something quick and reusable for plotting CDFs and PDFs for an intro stats class I taught. Coincidentally, @paulnorthrop wrote the plot
and autoplot
function at about the same time, and we decided it was easier for the time being to simply include both. It was mainly due to a lack of time to find out what would be the best approach.
My original idea was to create geom_pdf
and geom_cdf
to make it easy to both plot PDFs and CDFs, but also to overlay distributions onto histograms. My goal originally was to get to a point where I could do something like
# A normal curve
ggplot() + geom_pdf(d = Normal())
# Normal curve on top of histogram. Might be super tricky cause you need to be careful about
# bin sizes and normalize, but never got to a point where that was the issue...
ggplot(data = students, aes(x = height)) + geom_histogram(...) + geom_pdf(d = Normal(..., ...))
Unfortunately, I didn't have time to dive deep enough into how to write geom_*
and stat_*
functions -- as you can tell from my code, I tried, but didn't quite get there. I still think this would be the most elegant solution, but in my (limited) experience, extending ggplot2
is hard...
As @alexpghayes, I would prefer to keep the plot_cdf()
and plot_pdf()
. Personally, I wouldn't mind making ggplot2
a dependency for this and getting rid of plot
and autoplot
, but I understand if you want to keep a base option around.
Anyway, just wanted to chime in. If anyone wants to work on an implementation that functions as a ggplot2
extension, I think that would be awesome, and I'd happily help.
Thanks everybody for the discussion so far. I will try to structure my feedback on the different aspects a bit.
Discrete vs. continuous
is_discrete()
or minincrement()
or something along those lines.plot.class
method that might call the general plot.distribution
function internally after setting up some variables (like ranges, breaks, etc.).Functions vs. methods
plot_pdf()
and plot_cdf()
because this is not very "DRY" (don't repeat yourself). I think there is more what the PDF and CDF visualizations have in common than what differs, so I would keep them in a single function.ggplot2
visualizations. I think many aspects between these are quite different so my feeling is that they could be kept separate.plot()
and autoplot()
are nice and intuitively distinguish between the plotting systems. (And even if plot_*df()
functions are kept around I would recommend building the methods on top of them, just for convenience.)ggplot2 dependency
ggplot2
as a strong dependency but just as a Suggests
for example. In other packages I have done so by registering the autoplot()
methods only conditionally on ggplot2
being loaded, adding a stopifnot(requireNamespace("ggplot2"))
in the function, and only using fully-qualified ggplot::
function names internally.Merging the fix in for now, let's continue discussion about plotting code refactor at #77.
Alex (@alexpghayes),
this contains a minor PR for fixing the warnings in the base plot generic encountered in PR #74 by @zeileis. The plot function currently checks if the variable name is a single upper case letter.
This only works for existing distribution objects:
More generally, would you like me to write an issue or even a PR to unify the plot functions? Currently there is a
plot.distributions3()
with the boolean argumentcdf
, as well as the functionsplot_pdf()
andplot_cdf()
based on ggplot2:plot()
andautoplot()
with the argumentstype = c("pdf", "cdf"))
.ggdist
?