aphalo / ggpmisc

R package ggpmisc is an extension to ggplot2 and the Grammar of Graphics
94 stars 6 forks source link

Compatibility with `gganimate` package #38

Closed EvoLandEco closed 1 year ago

EvoLandEco commented 1 year ago

According to my tests ggpmisc and gganimate are currently not compatible with each other, or please enlight me if I was wrong:

#> Loading required package: ggplot2
#> Loading required package: ggpp
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>     annotate

# Animation without stat_poly_eq()
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth(method = "lm") +
  transition_states(cut, transition_length = 1, state_length = 1) +
  enter_fade() + exit_shrink() +
  labs(title = "Cut = {closest_state}")
#> `geom_smooth()` using formula = 'y ~ x'

# Static plot with stat_poly_eq()
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth(method = "lm") +
#> `geom_smooth()` using formula = 'y ~ x'

# Static plot with stat_poly_eq() and facets
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth(method = "lm") +
  stat_poly_eq() + facet_wrap(. ~ cut)
#> `geom_smooth()` using formula = 'y ~ x'

# Adding stat_poly_eq() to the animation causes the error
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth(method = "lm") +
  transition_states(cut, transition_length = 1, state_length = 1) +
  enter_fade() + exit_shrink() +
  stat_poly_eq() +
  labs(title = "Cut = {closest_state}")
#> `geom_smooth()` using formula = 'y ~ x'
#> Warning: Computation failed in `stat_poly_eq()`
#> Caused by error in `abs()`:
#> ! non-numeric argument to mathematical function
#> Error in `$<-.data.frame`(`*tmp*`, "group", value = ""): replacement has 1 row, data has 0

# Combining stat_poly_eq() with facets fails for each facet
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth(method = "lm") +
  transition_states(clarity, transition_length = 1, state_length = 1) +
  enter_fade() + exit_shrink() +
  stat_poly_eq() + facet_wrap(. ~ cut) +
  labs(title = "Clarity = {closest_state}")
#> `geom_smooth()` using formula = 'y ~ x'
#> Warning: Computation failed in `stat_poly_eq()`
#> Computation failed in `stat_poly_eq()`
#> Computation failed in `stat_poly_eq()`
#> Computation failed in `stat_poly_eq()`
#> Computation failed in `stat_poly_eq()`
#> Caused by error in `abs()`:
#> ! non-numeric argument to mathematical function
#> Error in `$<-.data.frame`(`*tmp*`, "group", value = ""): replacement has 1 row, data has 0

Created on 2023-06-18 with reprex v2.0.2

It might also be the case that more efforts should be made by the author(s) of gganimte, but it would be nice if you could investigate a bit why they are not compatible

aphalo commented 1 year ago

@EvoLandEco Many thanks for reporting this problem! A very quick check did not reveal the cause of the problem. I'll keep this issue open and come back to it as soon as possible.

EvoLandEco commented 1 year ago

@aphalo No worries! Though this compatibility will surely boost my thesis and paper writing, thank you again for what you've done already .

aphalo commented 1 year ago

@EvoLandEco 'gganimate' seems to struggle sometimes with mappings, especially those set for the whole plot by calling aes() as an argument to 'ggplot()' and possibly the default mappings set by statistics. I cannot yet make sense of what triggers errors in 'gganimate' and what makes it silently ignore mappings. In addition, mappings not set directly through a call to aes() seem to als'ggplot()' and possibly the default mappings set by statistics. I cannot yet make sense of what triggers errors in 'gganimo cause difficulties.

[Edited] Fixing this problem did not seem easy without me studying the internals of 'gganimate', but see the next comment.

aphalo commented 1 year ago

@EvoLandEco I think I found the root of the problem. stat_poly_eq() expects the column group at it's data input to be integer as I always thought it would be, and as far as I know always is in 'ggplot2'. 'gganimate' changes group into a character vector to distinguish scenes, and this breaks my code.

I used stat_debug_group() to print the data received as input by statistics as follows. I need still to think how to make 'ggpmisc' compatible with 'gganimate', but I now have a rough idea of what is needed...

#> Loading required package: ggplot2

diamonds <- diamonds[sample.int(nrow(diamonds), nrow(diamonds) %/% 25), ]

# 'gganimate' converts group from integer into character
ggplot(diamonds, aes(x = carat, y = price)) +
  stat_debug_group(summary.fun = as_tibble) +
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 66 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  0.9   2873 1     -1<1>
#>  2  0.5   1069 1     -1<1>
#>  3  0.7    945 1     -1<1>
#>  4  1.5   8190 1     -1<1>
#>  5  1.01  4072 1     -1<1>
#>  6  2.01 14402 1     -1<1>
#>  7  1.01  6366 1     -1<1>
#>  8  0.7   1895 1     -1<1>
#>  9  0.78  2312 1     -1<1>
#> 10  0.5   1238 1     -1<1>
#> # ℹ 56 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 185 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  1.17  3866 1     -1<2>
#>  2  0.71  2161 1     -1<2>
#>  3  1.01  4912 1     -1<2>
#>  4  1.56  7094 1     -1<2>
#>  5  1.36  7549 1     -1<2>
#>  6  0.9   3621 1     -1<2>
#>  7  0.31   462 1     -1<2>
#>  8  0.7   2335 1     -1<2>
#>  9  0.34   589 1     -1<2>
#> 10  0.31   924 1     -1<2>
#> # ℹ 175 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 466 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  1     6732 1     -1<3>
#>  2  0.5   1243 1     -1<3>
#>  3  0.26   499 1     -1<3>
#>  4  0.52  1273 1     -1<3>
#>  5  1.24  8298 1     -1<3>
#>  6  0.9   3975 1     -1<3>
#>  7  0.9   3909 1     -1<3>
#>  8  0.71  2098 1     -1<3>
#>  9  2.01 17751 1     -1<3>
#> 10  0.82  2643 1     -1<3>
#> # ℹ 456 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 586 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  0.33   965 1     -1<4>
#>  2  0.57  1746 1     -1<4>
#>  3  1.2   4131 1     -1<4>
#>  4  0.9   3774 1     -1<4>
#>  5  1.71 10457 1     -1<4>
#>  6  0.79  3230 1     -1<4>
#>  7  0.32   828 1     -1<4>
#>  8  1.02  3856 1     -1<4>
#>  9  0.7   3365 1     -1<4>
#> 10  0.75  3108 1     -1<4>
#> # ℹ 576 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 854 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  0.33   564 1     -1<5>
#>  2  0.3    814 1     -1<5>
#>  3  0.71  3710 1     -1<5>
#>  4  0.43   818 1     -1<5>
#>  5  1.02  6418 1     -1<5>
#>  6  0.81  2894 1     -1<5>
#>  7  0.58  1332 1     -1<5>
#>  8  2.22 15584 1     -1<5>
#>  9  0.41   788 1     -1<5>
#> 10  0.3    835 1     -1<5>
#> # ℹ 844 more rows

Created on 2023-06-19 with reprex v2.0.2

aphalo commented 1 year ago

@EvoLandEco Dear TianJian I think that the bug is now fixed. Please, install 'ggpmisc' from this GitHub repository, and let me know if it also works with your own data plots. The reprex you provided does now work as expected and a couple of variations that I tried.

Many thanks for reporting the problem and providing an example!

# 'ggpmisc' future Version 0.5.3

#> Loading required package: ggplot2
#> Loading required package: ggpp
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>     annotate

# Animation with stat_poly_eq()
ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point() +
  stat_poly_line() +
  stat_poly_eq() +
  transition_states(cut, transition_length = 1, state_length = 1) +
  enter_fade() + exit_shrink() +
  labs(title = "Cut = {closest_state}")

ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point() +
  stat_poly_line() +
  stat_poly_eq(mapping = use_label(c("eq", "R2", "F"))) +
  transition_states(cut, transition_length = 1, state_length = 1) +
  enter_fade() + exit_shrink() +
  labs(title = "Cut = {closest_state}")

# Animation with stat_poly_eq()
ggplot(diamonds, aes(x = carat, y = price, color = color)) +
  geom_point() +
  stat_poly_line() +
  stat_poly_eq() +
  theme_bw() +
  transition_states(cut, transition_length = 1, state_length = 2) +
  enter_fade() + exit_shrink() +
  labs(title = "Cut = {closest_state}")

Created on 2023-06-19 with reprex v2.0.2

EvoLandEco commented 1 year ago

Thank you for the quick update! I tried it yesterday evenning but had some issues with my own data, I believe they were due to the complex nature of a real data set.

There were mainly three warnings:

Warning: Not enough data to perform fit for group 1; computing mean instead.

Warning in ci_f_ncp(stat, df1 = df1, df2 = df2, probs = probs) :
  Upper limit outside search range. Set to the maximum of the parameter range.

Warning: Computation failed in `stat_poly_eq()`
Caused by error in `check_output()`:
! out[1] <= out[2] is not TRUE

For the first one it was due to having only one observation in a group, but I have not idea what the other two came from. As a result the rr label cannot be shown in all of the frames of one panel. Do you have any idea about the potential cause? Or if you are interested in the data set I could also send you by email.

EvoLandEco commented 1 year ago

@EvoLandEco I think I found the root of the problem. stat_poly_eq() expects the column group at it's data input to be integer as I always thought it would be, and as far as I know always is in 'ggplot2'. 'gganimate' changes group into a character vector to distinguish scenes, and this breaks my code.

I used stat_debug_group() to print the data received as input by statistics as follows. I need still to think how to make 'ggpmisc' compatible with 'gganimate', but I now have a rough idea of what is needed...

#> Loading required package: ggplot2

diamonds <- diamonds[sample.int(nrow(diamonds), nrow(diamonds) %/% 25), ]

# 'gganimate' converts group from integer into character
ggplot(diamonds, aes(x = carat, y = price)) +
  stat_debug_group(summary.fun = as_tibble) +
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 66 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  0.9   2873 1     -1<1>
#>  2  0.5   1069 1     -1<1>
#>  3  0.7    945 1     -1<1>
#>  4  1.5   8190 1     -1<1>
#>  5  1.01  4072 1     -1<1>
#>  6  2.01 14402 1     -1<1>
#>  7  1.01  6366 1     -1<1>
#>  8  0.7   1895 1     -1<1>
#>  9  0.78  2312 1     -1<1>
#> 10  0.5   1238 1     -1<1>
#> # ℹ 56 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 185 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  1.17  3866 1     -1<2>
#>  2  0.71  2161 1     -1<2>
#>  3  1.01  4912 1     -1<2>
#>  4  1.56  7094 1     -1<2>
#>  5  1.36  7549 1     -1<2>
#>  6  0.9   3621 1     -1<2>
#>  7  0.31   462 1     -1<2>
#>  8  0.7   2335 1     -1<2>
#>  9  0.34   589 1     -1<2>
#> 10  0.31   924 1     -1<2>
#> # ℹ 175 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 466 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  1     6732 1     -1<3>
#>  2  0.5   1243 1     -1<3>
#>  3  0.26   499 1     -1<3>
#>  4  0.52  1273 1     -1<3>
#>  5  1.24  8298 1     -1<3>
#>  6  0.9   3975 1     -1<3>
#>  7  0.9   3909 1     -1<3>
#>  8  0.71  2098 1     -1<3>
#>  9  2.01 17751 1     -1<3>
#> 10  0.82  2643 1     -1<3>
#> # ℹ 456 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 586 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  0.33   965 1     -1<4>
#>  2  0.57  1746 1     -1<4>
#>  3  1.2   4131 1     -1<4>
#>  4  0.9   3774 1     -1<4>
#>  5  1.71 10457 1     -1<4>
#>  6  0.79  3230 1     -1<4>
#>  7  0.32   828 1     -1<4>
#>  8  1.02  3856 1     -1<4>
#>  9  0.7   3365 1     -1<4>
#> 10  0.75  3108 1     -1<4>
#> # ℹ 576 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 854 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  0.33   564 1     -1<5>
#>  2  0.3    814 1     -1<5>
#>  3  0.71  3710 1     -1<5>
#>  4  0.43   818 1     -1<5>
#>  5  1.02  6418 1     -1<5>
#>  6  0.81  2894 1     -1<5>
#>  7  0.58  1332 1     -1<5>
#>  8  2.22 15584 1     -1<5>
#>  9  0.41   788 1     -1<5>
#> 10  0.3    835 1     -1<5>
#> # ℹ 844 more rows


Created on 2023-06-19 with reprex v2.0.2

I tested a bit, if I set aes(color = color) and transition_manual(frames), if color is continuous, the grouping string is formatted as "-1". If color is discrete, then it will be formatted as "discreate_level". Seems that stat_poly_eq() only needs the first part of the grouping string to calculate the after stats, and that is the reason why you use regex to delete the <.*> part in an if condition.

I think it might be better to use grepl("-1?[0-9]+<[0-9]+>", data$group[1]) or grepl("-1<[0-9]+>", data$group[1]) | grepl("^[0-9]+<[0-9]+>", data$group[1]) to minimize the chance that any other package had simialr behavior, if gganimate doesn't cross beyond this formatting.

aphalo commented 1 year ago

@EvoLandEco In my third example, the group is not -1, but you are correct in that the test could be different to ensure long-term compatibility. grepl() would not work because it returns a logical value, but what I should instead of gsub() to remove the unwanted part is regex() to extract the first part of the string (an integer encoded a character) and discard whatever comes after it. However, any other package that modifies group into character would not work together with 'ggpmisc' unless the original integer value of 'group' can be extracted. The stat uses the integer group number to set the location of labels for the different groups (converting -1 into 1), otherwise they would overlap, or users would always need to set the positions manually.

The first message is triggered when there are not enough observations to fit the model. You should still get an equation like y = where is the mean. In most cases it can be ignored...

The second message I think is caused by failure of the algorithm used to compute confidence intervals, once again, most likely because of too few observations. The third warning is most likely an indirect consequence of this failure.

If there are not enough data to fit the model, R2 cannot be computed. There may be borderline cases when R2 can be computed but not its CI by bootstrapping, so I need to improve how this case can be handled. I will most likely disable the CI computation by default as it is also time consuming, but will need to handle the failure of CI computation more gracefully.

EvoLandEco commented 1 year ago

@aphalo Good to know this, a better error handling will surely reduce the possibility to fail, and it is indeed hard to ensure a long term compatibility.

It's also quite hard for me to check and ensure enough observations because I generate a big amount of data through stochastic simulation on clustering computer, plots are thus automatically produced through pipeline.

I look forward to the next version of ggpmisc, thank you again!

aphalo commented 1 year ago

@EvoLandEco I was confused, no boostraping is involved in the stat. It is sometimes difficult to decide what should be a warning and what a message... The first one I think should be a message rather than a warning, and the test should take into account the model formula... In the case of lm() singularity is easier to handle automatically because lm() handles it gracefully. For rlm() it is difficult to automate because it stops with an error, so I added a parameter n.min that makes it possible to skip fitting the model given by formula when n < n.min in a group, fitting y ~ 1 instead of the model given by formula. The responsibilty is with the user, but in cases like your data when n is not predictable, it should help.

Testing for valid arguments to the CI calculation before attempting it, should solve the second error.

I am not sure if the last error is dependent on any of these. Anyway, NaNs are now handled correctly. This was a bug.

This is mostly a note to myself.

aphalo commented 1 year ago

@EvoLandEco I updated the regular expressions, but not exactly as you suggested, anyway this was a good point that you raised. Thanks! grepl("^(-1|[0-9]+).*$", data$group[1])) and gsub("^(-1|[0-9]+).*$", "\\1", data$group[1]).

If you have time, please, check the current version from GitHub. Thanks in advance!

I will update stat_ma_eq() and stat_quant_eq() before submitting to CRAN, and possibly other issues. So it will take some days or even a week or two before I release version 0.5.3.

Currently, when a parameter estimate is NA or NaN, the label is set to character(0). This produces "clean" plots, but may be confusing when say, R^2 is not shown at all. I am unsure about what is the most useful approach... Any suggestions?

EvoLandEco commented 1 year ago

I think the latest update solved my current issue, really appreciated. I will be on holidays for two weeks, hopefully I will be able to try out your latest CRAN build by then

aphalo commented 1 year ago

Fixed stat_quant_eq() and stat_ma_eq(). I still need to fix stat_correlation() and stats based on 'broom'.

aphalo commented 1 year ago

Fixed all remaining stats.