Closed billh2006 closed 4 years ago
Hi, thank you for your report.
Could you please pass the column BLOCK to factor with dataset <- to_factor(dataset, BLOCK)
,and report if any warning is generated when you run inspect(data)
?
Is there any missing value in the data? Please, use has_na(data)
.
Based on the levels of your experiment the expected number of rows would be 1170 (13 x 30 x 3), Am I right? Maybe have you unbalanced data? Try to make a two-way table with make_mat(dataset, ENV, GEN, GY)
and see if any cell has NA
.
You can use the package reprex to report the results here Regards!
@billh2006 I closed accidentally this issue. I added a commit that improves the check of data before computing the cross-validation. Please, install the development version and check with your data.
@TiagoOlivoto- Thanks for your quick response. Yes-my data set is unbalanced. Certain genotypes are not represented in all environments and 8 (thank you for the new error message) of the genotype-environment interactions are missing data for one or more of the reps. Is a complete, balanced dataset a necessary condition for cv_blup and downstream functions (e.g. waasb)?
@billh2006 - You can have unbalanced data but it is necessary that all genotype-environment combinations have the same number of replicates (complete dataset). If you assume to have three replicates, for each cross-validation procedure the function will select randomly two replicates for each genotype-environment combination to serve as training set. The other replicate will serve as validation set. If for some GEI combinations you have only two reps, the model cannot be computed because there will be no validation data for such a combination. Please, note a toy example bellow with the example data data_ge
library(metan)
#> Registered S3 method overwritten by 'GGally':
#> method from
#> +.gg ggplot2
#> []=====================================================[]
#> [] Multi-Environment Trial Analysis (metan) v1.4.0.9000[]
#> [] Author: Tiago Olivoto []
#> [] Type citation('metan') to know how to cite metan []
#> [] Type vignette('metan_start') for a short tutorial []
#> [] Visit http://bit.ly/2TIq6JE for a complete tutorial []
#> []=====================================================[]
df <- data_ge
df[1, 4] <- NA
df
#> # A tibble: 420 x 5
#> ENV GEN REP GY HM
#> <fct> <fct> <fct> <dbl> <dbl>
#> 1 E1 G1 1 NA 44.9
#> 2 E1 G1 2 2.50 46.9
#> 3 E1 G1 3 2.43 47.8
#> 4 E1 G2 1 3.21 45.2
#> 5 E1 G2 2 2.93 45.3
#> 6 E1 G2 3 2.56 45.5
#> 7 E1 G3 1 2.77 46.7
#> 8 E1 G3 2 3.62 43.2
#> 9 E1 G3 3 2.28 47.8
#> 10 E1 G4 1 2.36 47.9
#> # ... with 410 more rows
cv_blup(df, env = ENV, gen = GEN, rep = REP, resp = GY, nboot = 10)
#> ENV GEN n
#> 1 E1 G1 2
#> Error: Combinations of genotype and environment with different number of replication than observed in the trial (3)
# An unbalanced data
df2 <- data_ge
df2[1:3, 4] <- NA
df2
#> # A tibble: 420 x 5
#> ENV GEN REP GY HM
#> <fct> <fct> <fct> <dbl> <dbl>
#> 1 E1 G1 1 NA 44.9
#> 2 E1 G1 2 NA 46.9
#> 3 E1 G1 3 NA 47.8
#> 4 E1 G2 1 3.21 45.2
#> 5 E1 G2 2 2.93 45.3
#> 6 E1 G2 3 2.56 45.5
#> 7 E1 G3 1 2.77 46.7
#> 8 E1 G3 2 3.62 43.2
#> 9 E1 G3 3 2.28 47.8
#> 10 E1 G4 1 2.36 47.9
#> # ... with 410 more rows
cv_blup(df2, env = ENV, gen = GEN, rep = REP, resp = GY, nboot = 10)
#> $RMSPD
#> MODEL RMSPD
#> 1 BLUP_g_RCBD 0.3552628
#> 2 BLUP_g_RCBD 0.3763954
#> 3 BLUP_g_RCBD 0.4266109
#> 4 BLUP_g_RCBD 0.4532448
#> 5 BLUP_g_RCBD 0.4043090
#> 6 BLUP_g_RCBD 0.4221250
#> 7 BLUP_g_RCBD 0.4243280
#> 8 BLUP_g_RCBD 0.3874620
#> 9 BLUP_g_RCBD 0.3697422
#> 10 BLUP_g_RCBD 0.3845615
#>
#> $RMSPDmean
#> # A tibble: 1 x 6
#> MODEL mean sd se Q2.5 Q97.5
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 BLUP_g_RCBD 0.400 0.0308 0.00972 0.359 0.447
#>
#> attr(,"class")
#> [1] "cvalidation"
Created on 2020-04-07 by the reprex package (v0.3.0)
You can use both unbalanced and incomplete data with waasb()
and gamem_met()
. Since we don't need a validation set, the function will simply ignore the missing data
library(metan)
#> Registered S3 method overwritten by 'GGally':
#> method from
#> +.gg ggplot2
#> []=====================================================[]
#> [] Multi-Environment Trial Analysis (metan) v1.4.0.9000[]
#> [] Author: Tiago Olivoto []
#> [] Type citation('metan') to know how to cite metan []
#> [] Type vignette('metan_start') for a short tutorial []
#> [] Visit http://bit.ly/2TIq6JE for a complete tutorial []
#> []=====================================================[]
df <- data_ge
df[1, 4] <- NA
df
#> # A tibble: 420 x 5
#> ENV GEN REP GY HM
#> <fct> <fct> <fct> <dbl> <dbl>
#> 1 E1 G1 1 NA 44.9
#> 2 E1 G1 2 2.50 46.9
#> 3 E1 G1 3 2.43 47.8
#> 4 E1 G2 1 3.21 45.2
#> 5 E1 G2 2 2.93 45.3
#> 6 E1 G2 3 2.56 45.5
#> 7 E1 G3 1 2.77 46.7
#> 8 E1 G3 2 3.62 43.2
#> 9 E1 G3 3 2.28 47.8
#> 10 E1 G4 1 2.36 47.9
#> # ... with 410 more rows
mod <- waasb(df, ENV, GEN, REP, GY)
#> Warning: Row(s) 1 with NA values deleted.
#> Method: REML/BLUP
#> Random effects: GEN, GEN:ENV
#> Fixed effects: ENV, REP(ENV)
#> Denominador DF: Satterthwaite's method
#> ---------------------------------------------------------------------------
#> P-values for Likelihood Ratio Test of the analyzed traits
#> ---------------------------------------------------------------------------
#> model GY
#> COMPLETE NA
#> GEN 1.19e-05
#> GEN:ENV 2.02e-11
#> ---------------------------------------------------------------------------
#> All variables with significant (p < 0.05) genotype-vs-environment interaction
gmd(mod)
#> Class of the model: waasb
#> Variable extracted: genpar
#> # A tibble: 9 x 2
#> Parameters GY
#> <chr> <dbl>
#> 1 Phenotypic variance 0.181
#> 2 Heritability 0.154
#> 3 GEIr2 0.313
#> 4 Heribatility of means 0.814
#> 5 Accuracy 0.902
#> 6 rge 0.370
#> 7 CVg 6.24
#> 8 CVr 11.6
#> 9 CV ratio 0.537
Created on 2020-04-07 by the reprex package (v0.3.0)
Thank you!
Hi,
I am testing a sample dataset that has the same format as the oat dataset as demonstrated. This was a RCBD design with 3 replicates
str(dataset) 'data.frame': 791 obs. of 4 variables: $ ENV : Factor w/ 13 levels "2017_P","2017_S",..: 10 10 10 10 10 10 10 10 10 10 ... $ GEN : Factor w/ 30 levels "8xp","XR_A0",..: 23 24 30 2 10 26 11 3 12 27 ... $ BLOCK: int 1 1 1 1 1 1 1 1 1 1 ... $ GY : num 233 225 248 259 255 ...
When I use cv_blup, I get the following error message:
BLUP_model = cv_blup(dataset, resp = GY, gen = GEN, env = ENV, rep = BLOCK, nboot = 200, random="gen")
Error: Column
error
must be length 262 (the number of rows) or one, not 264 In addition: Warning message: In pred - testing$Y : longer object length is not a multiple of shorter object lengthWhen I rerun it the command multiple times without changing anything else, the first line of the error message changes each time. For example: Error: Column
error
must be length 261 (the number of rows) or one, not 265 Error: Columnerror
must be length 262 (the number of rows) or one, not 266 Error: Columnerror
must be length 262 (the number of rows) or one, not 263 Error: Columnerror
must be length 262 (the number of rows) or one, not 266 Error: Columnerror
must be length 262 (the number of rows) or one, not 265 Error: Columnerror
must be length 261 (the number of rows) or one, not 267 Error: Columnerror
must be length 262 (the number of rows) or one, not 265Do you know what is happening here?
Thanks!