TiagoOlivoto / metan

Package for multi-environment trial analysis
https://tiagoolivoto.github.io/metan/
GNU General Public License v3.0
35 stars 17 forks source link

Shukla Stablility with wrong length of levels #18

Closed Censkey closed 2 years ago

Censkey commented 2 years ago

Hi, when calculating shukla stability using the Shukla() function, I ran into a problem where the results change depending on the subset of my data. I'm not sure if it's desired to take the initial number of levels from the data frame, as in line 55 with as.factor(), or if there's something wrong with using factor() instead to solve this problem?

https://github.com/TiagoOlivoto/metan/blob/526b4625621f8cae074bc9b068eb6d1a7ae1c070/R/Shukla.R#L55

Here is a reproducible example with the provided sample data:

out1 <- Shukla(data_ge2[data_ge2$ENV != "A1", ], #subsetting one environment
              env = ENV,
              gen = GEN,
              rep = REP,
              resp = PH)

data_ge3 <- data_ge2[data_ge2$ENV != "A1", ] #subsetting the same environment
data_ge3$ENV <- factor(data_ge3$ENV) #but relevel the factor

out2 <- Shukla(data_ge3,
               env = ENV,
               gen = GEN,
               rep = REP,
               resp = PH)
print(out1)
print(out2)

And here the results:

> print(out1)
Variable PH 
---------------------------------------------------------------------------
Shukla stability variance
---------------------------------------------------------------------------
# A tibble: 13 × 6
   GEN       Y ShuklaVar rMean rShukaVar ssiShukaVar
   <fct> <dbl>     <dbl> <dbl>     <dbl>       <dbl>
 1 H1     2.59  0.0383       1         7           8
 2 H10    2.16  0.0172      13         3          16
 3 H11    2.27  0.0230      10         4          14
 4 H12    2.35  0.0760       8        11          19
 5 H13    2.46  0.0845       7        13          20
 6 H2     2.54  0.0586       2         9          11
 7 H3     2.48  0.0820       5        12          17
 8 H4     2.48  0.0479       4         8          12
 9 H5     2.48  0.0159       6         2           8
10 H6     2.49  0.0352       3         6           9
11 H7     2.28  0.0258       9         5          14
12 H8     2.20  0.0607      11        10          21
13 H9     2.18 -0.000876    12         1          13

> print(out2)
Variable PH 
---------------------------------------------------------------------------
Shukla stability variance
---------------------------------------------------------------------------
# A tibble: 13 × 6
   GEN       Y ShuklaVar rMean rShukaVar ssiShukaVar
   <fct> <dbl>     <dbl> <dbl>     <dbl>       <dbl>
 1 H1     2.59   0.0574      1         7           8
 2 H10    2.16   0.0258     13         3          16
 3 H11    2.27   0.0345     10         4          14
 4 H12    2.35   0.114       8        11          19
 5 H13    2.46   0.127       7        13          20
 6 H2     2.54   0.0880      2         9          11
 7 H3     2.48   0.123       5        12          17
 8 H4     2.48   0.0719      4         8          12
 9 H5     2.48   0.0238      6         2           8
10 H6     2.49   0.0528      3         6           9
11 H7     2.28   0.0387      9         5          14
12 H8     2.20   0.0910     11        10          21
13 H9     2.18  -0.00131    12         1          13
TiagoOlivoto commented 2 years ago

Hi!

If you subset the data, I strongly suggest using droplevels(). See what the 'note' of droplevels documentation says "Notice that subsetting does not in general drop unused levels"

library(metan)
#> Registered S3 method overwritten by 'GGally':
#>   method from   
#>   +.gg   ggplot2
#> |=========================================================|
#> | Multi-Environment Trial Analysis (metan) v1.17.0        |
#> | Author: Tiago Olivoto                                   |
#> | Type 'citation('metan')' to know how to cite metan      |
#> | Type 'vignette('metan_start')' for a short tutorial     |
#> | Visit 'https://bit.ly/pkgmetan' for a complete tutorial |
#> |=========================================================|
sub1 <- data_ge2[data_ge2$ENV != "A1", ]
levels(sub1$ENV)
#> [1] "A1" "A2" "A3" "A4"
# drop unused levels
sub2 <- droplevels(data_ge2[data_ge2$ENV != "A1", ])
levels(sub2$ENV)
#> [1] "A2" "A3" "A4"

Since here the function gets the number of environments, the results will change.

Since there is no a clear bug, I'm closing this now. Hope it helps!

Created on 2022-10-01 with reprex v2.0.2