biometryhub / biometryassist

A package to aid in teaching experimental design and analysis through easy access and documentation of helper functions. Renaming of previous BiometryTraining package.
https://biometryhub.github.io/biometryassist
Other
8 stars 1 forks source link

resplot Error in shapiro.test(aa.f$residuals) #87

Open daSilva5 opened 19 hours ago

daSilva5 commented 19 hours ago

Hi, I am having an error while using resplot(). The function plot() works fine. resplot() also works with shapiro= FALSE, so maybe is not a problem with this function directly. But I thought I should report the error to be sure. I am using a dataset from the National Heart, Lung and Blood Institute (NIH), it can be requested and downloaded for reproduction.

camp<-read_csv("camp_teach.csv")

head(camp)
str(camp)
camp2<-camp%>%mutate(DIF=POSFEV-PREFEV
                    )%>%group_by(id)%>%mutate(time=row_number())%>%ungroup()

list_id<-camp2%>%filter(time == 15)%>%dplyr::select(id)

camp3<-camp2%>%filter(id %in% list_id$id)%>%filter(!time>15)%>%
  mutate(idc=as.factor(id),
                    TG=as.factor(TG),
                    ETHNIC=as.factor(ETHNIC),
                    timec=as.factor(time))%>%arrange(idc,timec)
capm_ar1<-asreml(log(POSFEV+1)~TG+ETHNIC+timec,
                 random=~idc,
                 residuals=~id(idc):ar1(timec),
                 data=camp3)
resplot(capm_ar1) #who knows....This is the command used for asreml
Error in shapiro.test(aa.f$residuals) : 
  sample size must be between 3 and 5000
In addition: Warning messages:
1: Removed 23 rows containing non-finite outside the scale
range (`stat_bin()`). 
2: Removed 23 rows containing non-finite outside the scale
range (`stat_qq()`). 
3: Removed 23 rows containing non-finite outside the scale
range (`stat_qq_line()`). 

Cheers,

rogerssam commented 19 hours ago

Hi Isis,

Thanks for raising this! Are you able to share the data, or provide some more information about it? The error seems to be suggesting that the data has less than 3 values, or more than 5000. Is that likely?

daSilva5 commented 18 hours ago

rogerssam

Hi, If you accept my invitation I have added you to a repository with the file. I believe I cannot share it freely. You would need to request the file yourself.

The model has more than 5000 observations. Even removing the random the subject specific random effect, the warning remains the same.

rogerssam commented 18 hours ago

rogerssam

Hi, If you accept my invitation I have added you to a repository with the file. I believe I cannot share it freely. You would need to request the file yourself.

The model has more than 5000 observations. Even removing the random the subject specific random effect, the warning remains the same.

Thanks for sharing that with me. I think in this case there's not a lot we can do. The error is because the data size is larger than 5000, which causes problems (see some discussion about the technical and statistical reasons here).

I suggest that I will update the function to capture the error and provide a more informative error message, but essentially you will need to set shapiro = FALSE for large cases like this.

Hope that helps, and thanks for bringing it to my attention!