Closed EvoLandEco closed 1 year ago
@EvoLandEco Many thanks for reporting this problem! A very quick check did not reveal the cause of the problem. I'll keep this issue open and come back to it as soon as possible.
@aphalo No worries! Though this compatibility will surely boost my thesis and paper writing, thank you again for what you've done already .
@EvoLandEco 'gganimate' seems to struggle sometimes with mappings, especially those set for the whole plot by calling aes()
as an argument to 'ggplot()' and possibly the default mappings set by statistics. I cannot yet make sense of what triggers errors in 'gganimate' and what makes it silently ignore mappings. In addition, mappings not set directly through a call to aes()
seem to als'ggplot()' and possibly the default mappings set by statistics. I cannot yet make sense of what triggers errors in 'gganimo cause difficulties.
[Edited] Fixing this problem did not seem easy without me studying the internals of 'gganimate', but see the next comment.
@EvoLandEco I think I found the root of the problem. stat_poly_eq()
expects the column group
at it's data input to be integer
as I always thought it would be, and as far as I know always is in 'ggplot2'. 'gganimate' changes group
into a character vector to distinguish scenes, and this breaks my code.
I used stat_debug_group()
to print the data received as input by statistics as follows. I need still to think how to make 'ggpmisc' compatible with 'gganimate', but I now have a rough idea of what is needed...
library(gganimate)
#> Loading required package: ggplot2
library(gginnards)
library(tibble)
diamonds <- diamonds[sample.int(nrow(diamonds), nrow(diamonds) %/% 25), ]
# 'gganimate' converts group from integer into character
ggplot(diamonds, aes(x = carat, y = price)) +
stat_debug_group(summary.fun = as_tibble) +
transition_states(cut)
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 66 × 4
#> x y PANEL group
#> <dbl> <dbl> <fct> <chr>
#> 1 0.9 2873 1 -1<1>
#> 2 0.5 1069 1 -1<1>
#> 3 0.7 945 1 -1<1>
#> 4 1.5 8190 1 -1<1>
#> 5 1.01 4072 1 -1<1>
#> 6 2.01 14402 1 -1<1>
#> 7 1.01 6366 1 -1<1>
#> 8 0.7 1895 1 -1<1>
#> 9 0.78 2312 1 -1<1>
#> 10 0.5 1238 1 -1<1>
#> # ℹ 56 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 185 × 4
#> x y PANEL group
#> <dbl> <dbl> <fct> <chr>
#> 1 1.17 3866 1 -1<2>
#> 2 0.71 2161 1 -1<2>
#> 3 1.01 4912 1 -1<2>
#> 4 1.56 7094 1 -1<2>
#> 5 1.36 7549 1 -1<2>
#> 6 0.9 3621 1 -1<2>
#> 7 0.31 462 1 -1<2>
#> 8 0.7 2335 1 -1<2>
#> 9 0.34 589 1 -1<2>
#> 10 0.31 924 1 -1<2>
#> # ℹ 175 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 466 × 4
#> x y PANEL group
#> <dbl> <dbl> <fct> <chr>
#> 1 1 6732 1 -1<3>
#> 2 0.5 1243 1 -1<3>
#> 3 0.26 499 1 -1<3>
#> 4 0.52 1273 1 -1<3>
#> 5 1.24 8298 1 -1<3>
#> 6 0.9 3975 1 -1<3>
#> 7 0.9 3909 1 -1<3>
#> 8 0.71 2098 1 -1<3>
#> 9 2.01 17751 1 -1<3>
#> 10 0.82 2643 1 -1<3>
#> # ℹ 456 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 586 × 4
#> x y PANEL group
#> <dbl> <dbl> <fct> <chr>
#> 1 0.33 965 1 -1<4>
#> 2 0.57 1746 1 -1<4>
#> 3 1.2 4131 1 -1<4>
#> 4 0.9 3774 1 -1<4>
#> 5 1.71 10457 1 -1<4>
#> 6 0.79 3230 1 -1<4>
#> 7 0.32 828 1 -1<4>
#> 8 1.02 3856 1 -1<4>
#> 9 0.7 3365 1 -1<4>
#> 10 0.75 3108 1 -1<4>
#> # ℹ 576 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 854 × 4
#> x y PANEL group
#> <dbl> <dbl> <fct> <chr>
#> 1 0.33 564 1 -1<5>
#> 2 0.3 814 1 -1<5>
#> 3 0.71 3710 1 -1<5>
#> 4 0.43 818 1 -1<5>
#> 5 1.02 6418 1 -1<5>
#> 6 0.81 2894 1 -1<5>
#> 7 0.58 1332 1 -1<5>
#> 8 2.22 15584 1 -1<5>
#> 9 0.41 788 1 -1<5>
#> 10 0.3 835 1 -1<5>
#> # ℹ 844 more rows
Created on 2023-06-19 with reprex v2.0.2
@EvoLandEco Dear TianJian I think that the bug is now fixed. Please, install 'ggpmisc' from this GitHub repository, and let me know if it also works with your own data plots. The reprex you provided does now work as expected and a couple of variations that I tried.
Many thanks for reporting the problem and providing an example!
# 'ggpmisc' future Version 0.5.3
library(gganimate)
#> Loading required package: ggplot2
library(ggpmisc)
#> Loading required package: ggpp
#>
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>
#> annotate
# Animation with stat_poly_eq()
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point() +
stat_poly_line() +
stat_poly_eq() +
transition_states(cut, transition_length = 1, state_length = 1) +
enter_fade() + exit_shrink() +
labs(title = "Cut = {closest_state}")
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point() +
stat_poly_line() +
stat_poly_eq(mapping = use_label(c("eq", "R2", "F"))) +
transition_states(cut, transition_length = 1, state_length = 1) +
enter_fade() + exit_shrink() +
labs(title = "Cut = {closest_state}")
# Animation with stat_poly_eq()
ggplot(diamonds, aes(x = carat, y = price, color = color)) +
geom_point() +
stat_poly_line() +
stat_poly_eq() +
theme_bw() +
transition_states(cut, transition_length = 1, state_length = 2) +
enter_fade() + exit_shrink() +
labs(title = "Cut = {closest_state}")
Created on 2023-06-19 with reprex v2.0.2
Thank you for the quick update! I tried it yesterday evenning but had some issues with my own data, I believe they were due to the complex nature of a real data set.
There were mainly three warnings:
Warning: Not enough data to perform fit for group 1; computing mean instead.
Warning in ci_f_ncp(stat, df1 = df1, df2 = df2, probs = probs) :
Upper limit outside search range. Set to the maximum of the parameter range.
Warning: Computation failed in `stat_poly_eq()`
Caused by error in `check_output()`:
! out[1] <= out[2] is not TRUE
For the first one it was due to having only one observation in a group, but I have not idea what the other two came from. As a result the rr label cannot be shown in all of the frames of one panel. Do you have any idea about the potential cause? Or if you are interested in the data set I could also send you by email.
@EvoLandEco I think I found the root of the problem.
stat_poly_eq()
expects the columngroup
at it's data input to beinteger
as I always thought it would be, and as far as I know always is in 'ggplot2'. 'gganimate' changesgroup
into a character vector to distinguish scenes, and this breaks my code.I used
stat_debug_group()
to print the data received as input by statistics as follows. I need still to think how to make 'ggpmisc' compatible with 'gganimate', but I now have a rough idea of what is needed...library(gganimate) #> Loading required package: ggplot2 library(gginnards) library(tibble) diamonds <- diamonds[sample.int(nrow(diamonds), nrow(diamonds) %/% 25), ] # 'gganimate' converts group from integer into character ggplot(diamonds, aes(x = carat, y = price)) + stat_debug_group(summary.fun = as_tibble) + transition_states(cut) #> [1] "Summary of input 'data' to 'compute_group()':" #> # A tibble: 66 × 4 #> x y PANEL group #> <dbl> <dbl> <fct> <chr> #> 1 0.9 2873 1 -1<1> #> 2 0.5 1069 1 -1<1> #> 3 0.7 945 1 -1<1> #> 4 1.5 8190 1 -1<1> #> 5 1.01 4072 1 -1<1> #> 6 2.01 14402 1 -1<1> #> 7 1.01 6366 1 -1<1> #> 8 0.7 1895 1 -1<1> #> 9 0.78 2312 1 -1<1> #> 10 0.5 1238 1 -1<1> #> # ℹ 56 more rows #> [1] "Summary of input 'data' to 'compute_group()':" #> # A tibble: 185 × 4 #> x y PANEL group #> <dbl> <dbl> <fct> <chr> #> 1 1.17 3866 1 -1<2> #> 2 0.71 2161 1 -1<2> #> 3 1.01 4912 1 -1<2> #> 4 1.56 7094 1 -1<2> #> 5 1.36 7549 1 -1<2> #> 6 0.9 3621 1 -1<2> #> 7 0.31 462 1 -1<2> #> 8 0.7 2335 1 -1<2> #> 9 0.34 589 1 -1<2> #> 10 0.31 924 1 -1<2> #> # ℹ 175 more rows #> [1] "Summary of input 'data' to 'compute_group()':" #> # A tibble: 466 × 4 #> x y PANEL group #> <dbl> <dbl> <fct> <chr> #> 1 1 6732 1 -1<3> #> 2 0.5 1243 1 -1<3> #> 3 0.26 499 1 -1<3> #> 4 0.52 1273 1 -1<3> #> 5 1.24 8298 1 -1<3> #> 6 0.9 3975 1 -1<3> #> 7 0.9 3909 1 -1<3> #> 8 0.71 2098 1 -1<3> #> 9 2.01 17751 1 -1<3> #> 10 0.82 2643 1 -1<3> #> # ℹ 456 more rows #> [1] "Summary of input 'data' to 'compute_group()':" #> # A tibble: 586 × 4 #> x y PANEL group #> <dbl> <dbl> <fct> <chr> #> 1 0.33 965 1 -1<4> #> 2 0.57 1746 1 -1<4> #> 3 1.2 4131 1 -1<4> #> 4 0.9 3774 1 -1<4> #> 5 1.71 10457 1 -1<4> #> 6 0.79 3230 1 -1<4> #> 7 0.32 828 1 -1<4> #> 8 1.02 3856 1 -1<4> #> 9 0.7 3365 1 -1<4> #> 10 0.75 3108 1 -1<4> #> # ℹ 576 more rows #> [1] "Summary of input 'data' to 'compute_group()':" #> # A tibble: 854 × 4 #> x y PANEL group #> <dbl> <dbl> <fct> <chr> #> 1 0.33 564 1 -1<5> #> 2 0.3 814 1 -1<5> #> 3 0.71 3710 1 -1<5> #> 4 0.43 818 1 -1<5> #> 5 1.02 6418 1 -1<5> #> 6 0.81 2894 1 -1<5> #> 7 0.58 1332 1 -1<5> #> 8 2.22 15584 1 -1<5> #> 9 0.41 788 1 -1<5> #> 10 0.3 835 1 -1<5> #> # ℹ 844 more rows
Created on 2023-06-19 with reprex v2.0.2
I tested a bit, if I set aes(color = color) and transition_manual(frames), if color
is continuous, the grouping string is formatted as "-1color
is discrete, then it will be formatted as "discreate_levelstat_poly_eq()
only needs the first part of the grouping string to calculate the after stats, and that is the reason why you use regex to delete the <.*>
part in an if condition.
I think it might be better to use grepl("-1?[0-9]+<[0-9]+>", data$group[1])
or grepl("-1<[0-9]+>", data$group[1]) | grepl("^[0-9]+<[0-9]+>", data$group[1])
to minimize the chance that any other package had simialr behavior, if gganimate
doesn't cross beyond this formatting.
@EvoLandEco In my third example, the group is not -1, but you are correct in that the test could be different to ensure long-term compatibility. grepl()
would not work because it returns a logical value, but what I should instead of gsub()
to remove the unwanted part is regex()
to extract the first part of the string (an integer encoded a character) and discard whatever comes after it. However, any other package that modifies group into character would not work together with 'ggpmisc' unless the original integer value of 'group' can be extracted. The stat uses the integer group number to set the location of labels for the different groups (converting -1 into 1), otherwise they would overlap, or users would always need to set the positions manually.
The first message is triggered when there are not enough observations to fit the model. You should still get an equation like y =
The second message I think is caused by failure of the algorithm used to compute confidence intervals, once again, most likely because of too few observations. The third warning is most likely an indirect consequence of this failure.
If there are not enough data to fit the model, R2 cannot be computed. There may be borderline cases when R2 can be computed but not its CI by bootstrapping, so I need to improve how this case can be handled. I will most likely disable the CI computation by default as it is also time consuming, but will need to handle the failure of CI computation more gracefully.
@aphalo Good to know this, a better error handling will surely reduce the possibility to fail, and it is indeed hard to ensure a long term compatibility.
It's also quite hard for me to check and ensure enough observations because I generate a big amount of data through stochastic simulation on clustering computer, plots are thus automatically produced through pipeline.
I look forward to the next version of ggpmisc, thank you again!
@EvoLandEco I was confused, no boostraping is involved in the stat. It is sometimes difficult to decide what should be a warning and what a message... The first one I think should be a message rather than a warning, and the test should take into account the model formula... In the case of lm()
singularity is easier to handle automatically because lm()
handles it gracefully. For rlm()
it is difficult to automate because it stops with an error, so I added a parameter n.min
that makes it possible to skip fitting the model given by formula
when n < n.min
in a group, fitting y ~ 1
instead of the model given by formula
. The responsibilty is with the user, but in cases like your data when n
is not predictable, it should help.
Testing for valid arguments to the CI calculation before attempting it, should solve the second error.
I am not sure if the last error is dependent on any of these. Anyway, NaNs are now handled correctly. This was a bug.
This is mostly a note to myself.
@EvoLandEco I updated the regular expressions, but not exactly as you suggested, anyway this was a good point that you raised. Thanks! grepl("^(-1|[0-9]+).*$", data$group[1]))
and gsub("^(-1|[0-9]+).*$", "\\1", data$group[1])
.
If you have time, please, check the current version from GitHub. Thanks in advance!
I will update stat_ma_eq()
and stat_quant_eq()
before submitting to CRAN, and possibly other issues. So it will take some days or even a week or two before I release version 0.5.3.
Currently, when a parameter estimate is NA
or NaN
, the label is set to character(0)
. This produces "clean" plots, but may be confusing when say, R^2 is not shown at all. I am unsure about what is the most useful approach... Any suggestions?
I think the latest update solved my current issue, really appreciated. I will be on holidays for two weeks, hopefully I will be able to try out your latest CRAN build by then
Fixed stat_quant_eq()
and stat_ma_eq()
. I still need to fix stat_correlation()
and stats based on 'broom'.
Fixed all remaining stats.
According to my tests
ggpmisc
andgganimate
are currently not compatible with each other, or please enlight me if I was wrong:Created on 2023-06-18 with reprex v2.0.2
It might also be the case that more efforts should be made by the author(s) of
gganimte
, but it would be nice if you could investigate a bit why they are not compatible