cmu-delphi / covidcast

R and Python packages supporting Delphi's COVIDcast effort.
https://delphi.cmu.edu/covidcast/
33 stars 28 forks source link

covidcast R package fails additional checks on CRAN #605

Closed capnrefsmmat closed 1 year ago

capnrefsmmat commented 1 year ago

This is blocking us from releasing updates to the package, including the changes in #598 that improve compatibility with the latest ggplot.

CRAN does checks on standard platforms that ensure all tests run and the package can be built. (Despite the ggplot update, these checks pass on the current package because vdiffr makes plot tests all pass on CRAN, to avoid platform-specific graphical output issues.) It also does "additional checks" on non-standard setups. One of these is a system with ATLAS used as the BLAS implementation; the system setup is documented here.

The package tests and vignettes fail with ATLAS. The output is here. The failures are in sf as it transforms various geometries and, apparently, transforms them into invalid geometries.

Logan set up a VM with ATLAS using flexiblas; see Slack thread. He was unable to reproduce the failures.

If we want to update the package, we'll need to reproduce and diagnose the issue. My hypothesis is that the problem is due to shift_pr(), which multiplies by a rotation matrix and hence could be affected by BLAS implementation. This is supported by the specific errors in the CRAN output:

══ Failed tests ════════════════════════════════════════════════════════════════
  ── Error ('test-plot.R:57'): simple state choropleths ──────────────────────────
  Error in `(function (mapping = aes(), data = NULL, stat = "sf", position = "identity", 
      na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...) 
  {
      c(layer_sf(geom = GeomSf, data = data, mapping = mapping, 
          stat = stat, position = position, show.legend = show.legend, 
          inherit.aes = inherit.aes, params = list2(na.rm = na.rm, 
              ...)), coord_sf(default = TRUE))
  })(color = "white", size = 0.1, mapping = structure(list(geometry = ~geometry), class = "uneval"), 
      fill = "gray", data = structure(list(STATEFP = "72", STATENS = "01779808", 
          AFFGEOID = "0400000US72", GEOID = "72", STUSPS = "PR", 
          NAME = "Puerto Rico", LSAD = "00", ALAND = 8868701898, 
...

All of the errors that indicate the geometry in the output give Puerto Rico as the invalid geometry.

But without a way to reproduce the issue, we'll be unable to test this hypothesis, fix the problem, and convince the CRAN maintainers we've fixed it so they'll accept an update.

capnrefsmmat commented 1 year ago

@brookslogan Do you recall if you were using devtools::check()? I can't figure out how vdiffr determines whether to make plot failures test failures or not, but it hides plot failures on CRAN -- except, evidently, in the additional checks. We might want to manually run the tests on your VM to ensure it's not just hiding the failures.

brookslogan commented 1 year ago

I was primarily using devtools::check() although I tried R CMD check a couple times. At least with devtools::check(), I was getting warnings and failures out of vdiffr consistently; however, they were just the "expected" types (snapshots not existing yet for some things, some plotting style differences on other things) and not the non-closed ring / whatever geometry error they had in the ATLAS checks. Any reason to think it might hide certain vidffr tests and not others? If not, I'd doubt that it is hiding/skipping specifically a geometry-related failure.

brookslogan commented 1 year ago

I tried both with flexiblas and on R-devel using --with-blas="-L/usr/lib64/atlas -lsatlas" --with-lapack. I was never able to find the libRblas.so reference in the notes here. I don't know whether or not this is using LAPACK from R or from ATLAS; perhaps that could be another thing to try out.

bnaras commented 1 year ago

I doubt you need to worry about Atlas failures at least as you describe it; the failures seem to have to do with sf. In case you did not see it, a useful resource to test before submission is winbuilder: https://win-builder.r-project.org. A Mac m1 version to test against is here: https://mac.r-project.org/macbuilder/submit.html. There are also VMs on the rhub project that can be used---you can download the docker files and build your containers---but the cloud versions hang if not enough resources are available or something clogs it up for days. But useful nonetheless when it works. PS. I check against R-devel too.

brookslogan commented 1 year ago

I don't think we're worrying about it, but the CRAN maintainers are apparently worrying about it, which might block any future package updates.

Thanks for the pointers to the builders. I gave rhub local checks a try, but they immediately failed complaining about not setting a CRAN mirror. Haven't tried uploading to R Hub yet, but think @capnrefsmmat might have. I don't think they have a specialized ATLAS container, unfortunately. Not sure if we can get anywhere toward reproducing via R Hub.

capnrefsmmat commented 1 year ago

Yes, I've tested each release against winbuilder, and they always pass. Checks also pass on each machine we have access to. But when we submitted to CRAN, they pointed out the ATLAS failure and said:

please really fix for the next submission.

It's possible there's something different about the machine running the ATLAS check besides ATLAS, but I don't know how to find out what it is.

bnaras commented 1 year ago

I see. @brookslogan Let me try to build a vm per Ripley later today. (I did this a long time ago for something else..)

brookslogan commented 1 year ago

@bnaras many thanks! I ran out of ideas on how to try to reproduce. Would you be compiling from source? Would a Dockerfile starting point be useful?

bnaras commented 1 year ago

Sure, I think the Dockerfile would be useful.

brookslogan commented 1 year ago

Sorry, these aren't the cleanest Dockerfiles. A bit littered with random modifications trying to get things to reproduce.

The Dockerfiles below expect covidcast to be checked out in the same directory as the Dockerfile, in order to do a bind mount to avoid redownloading it a bunch and requiring a GitHub PAT to avoid rate limiting.

Dockerfile ```{Dockerfile} FROM fedora:34 # [after looking at Naras's successful reproduction below: # https://www.stats.ox.ac.uk/pub/bdr/Rblas/README.txt says Fedora 34 but links # to https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-gcc # which says Fedora 36, although I'd guess the configuration options are the # determining factor, not the OS version.] RUN dnf -y update && dnf clean all RUN dnf -y install R-core-devel && dnf clean all RUN dnf -y remove --noautoremove R-core R-core-devel # RUN dnf install -y openssl-devel harfbuzz-devel fribidi-devel libcurl-devel libxml2-devel proj-devel geos-devel gdal-devel freetype-devel libpng-devel libtiff-devel libjpeg-turbo-devel udunits2-devel sqlite-devel pandoc && dnf clean all # TODO just wget it here COPY R-devel.tar.gz /R-devel.tar.gz RUN tar -xf /R-devel.tar.gz RUN dnf -y install rsync RUN dnf -y install readline-devel RUN dnf -y install libX11-devel RUN dnf -y install dnf-utils # https://docs.rstudio.com/resources/install-r-source/ # RUN yum-builddep -y R RUN dnf builddep -y R # RUN dnf builddep -y R R-core R-core-devel R-java R-java-devel RUN dnf install -y atlas-devel # Combining&deriving from https://docs.rstudio.com/resources/install-r-source/, https://colinfay.me/r-installation-administration/appendix-a-essential-and-useful-other-programs-under-a-unix-alike.html, https://www.baeldung.com/find-java-home # also from https://stackoverflow.com/questions/42562160/r-cmd-javareconf-not-finding-jni-h, sort of # ./configure --help # # no need for libf77blas in Fedora >=21; https://bugzilla.redhat.com/show_bug.cgi?id=1497383 # satlas = serial ATLAS, https://gist.github.com/rmcgibbo/6317607 RUN JAVA_HOME=$(java -XshowSettings:properties 2>&1 | sed -e "s/^ *java.home = \(.*\)$/\\1/;t;d") \ ./configure \ --prefix=/opt/R/${R_VERSION} \ --enable-memory-profiling \ --enable-R-shlib \ --with-blas="-L/usr/lib64/atlas -lsatlas" \ --with-lapack # Reconfiguring without java. Not sure if configuration with java ever worked. I # don't remember if I was reconfiguring here for a reason or if it's just an # oversight and the with-java configuration can be commented out. RUN ./configure \ --prefix=/opt/R/R-devel \ --enable-memory-profiling \ --enable-R-shlib \ --with-blas="-L/usr/lib64/atlas -lsatlas" \ --with-lapack \ --enable-java=no RUN make RUN make install RUN (cd R-devel && ./tools/rsync-recommended) RUN (cd R-devel && ./configure) RUN (cd R-devel && make) # Possible ENTRYPOINT: # /opt/R/R-devel/bin/R --vanilla # install.packages("covidcast") # RUN Rscript -e 'options(warn=2L); install.packages("devtools", repos="https://ftp.osuosl.org/pub/cran/")' # RUN mkdir /fakepkgfordeps # COPY covidcast/R-packages/covidcast/DESCRIPTION /fakepkgfordeps/DESCRIPTION # RUN (cd /fakepkgfordeps && Rscript -e 'devtools::install_dev_deps(repos="https://ftp.osuosl.org/pub/cran/")') ```
build.sh ```{bash} #!/bin/sh docker build -t covidcast-fedora-checks-image . ```
run.sh ```{bash} #!/bin/sh #!/bin/sh docker run --rm -it \ --mount type=bind,source=$(realpath ./covidcast),target=/covidcast-bind \ --name covidcast-fedora-checks-container \ covidcast-fedora-checks-image ```
Dockerfile_earlier_attempts_1 ```{Dockerfile} FROM fedora:34 RUN dnf -y update && dnf clean all RUN dnf -y install R-core-devel && dnf clean all RUN dnf -y --setopt='tsflags=' reinstall R-core R-core-devel # need tsflags thing to get docs to install, to get devtools to be able to install RUN dnf install -y openssl-devel harfbuzz-devel fribidi-devel libcurl-devel libxml2-devel proj-devel geos-devel gdal-devel freetype-devel libpng-devel libtiff-devel libjpeg-turbo-devel udunits2-devel sqlite-devel RUN Rscript -e 'options(warn=2L); install.packages("devtools", repos="https://ftp.osuosl.org/pub/cran/")' RUN mkdir /fakepkgfordeps COPY covidcast/R-packages/covidcast/DESCRIPTION /fakepkgfordeps/DESCRIPTION RUN (cd /fakepkgfordeps && Rscript -e 'devtools::install_dev_deps(repos="https://ftp.osuosl.org/pub/cran/")') # install commands that should be moved above eventually (they're here # temporarily for fast iteration (layer caching)): RUN dnf install -y pandoc RUN dnf install -y atlas-devel RUN dnf install -y R-flexiblas flexiblas-atlas ```
Dockerfile_earlier_attempts_2 ```{Dockerfile} ## This file is licensed under GPL-2. It is derived from https://github.com/r-hub/rhub-linux-builders/blob/master/fedora/Dockerfile with modification. ## Emacs, make this -*- mode: sh; -*- FROM fedora:34 # ## Copy 'checkbashisms' (as a local copy from devscripts package) # COPY checkbashisms /usr/local/bin ## Set a default user. Available via runtime flag RUN useradd docker RUN dnf install -y \ gcc-gfortran \ less \ ca-certificates \ curl \ java-1.8.0-openjdk \ bzip2-devel \ cairo-devel \ ghostscript \ libcurl-devel \ libicu-devel \ libjpeg-turbo-devel \ pango-devel \ pcre-devel \ libpng-devel \ readline-devel \ libtiff-devel \ libX11-devel \ libXt-devel \ subversion \ tcl-devel \ texinfo \ texlive-latex \ texlive-collection-fontsextra \ texlive-scheme-basic \ tk-devel \ unzip \ xorg-x11-proto-devel \ findutils \ make \ texinfo-tex \ xz-devel \ zlib-devel \ libXmu-devel \ tar \ texlive-ec \ texlive-parskip \ texlive-collection-fontsrecommended \ which \ xorg-x11-server-Xvfb RUN dnf install -y \ dnf-plugins-core RUN dnf install -y glibc-langpack-en.x86_64 RUN dnf install -y valgrind RUN dnf install -y qpdf RUN curl -o /usr/bin/pandoc.gz \ https://files.r-hub.io/pandoc/linux-64/pandoc.gz && \ gzip -d /usr/bin/pandoc.gz && \ curl -o /usr/bin/pandoc-citeproc.gz \ https://files.r-hub.io/pandoc/linux-64/pandoc-citeproc.gz && \ gzip -d /usr/bin/pandoc-citeproc.gz && \ chmod +x /usr/bin/pandoc /usr/bin/pandoc-citeproc RUN dnf install -y aspell aspell-en RUN dnf install -y file RUN dnf install -y xorg-x11-fonts-100dpi xorg-x11-fonts-75dpi RUN dnf update -y ENV LC_ALL en_US.UTF-8 ENV LANG en_US.UTF-8 RUN dnf -y update && dnf clean all RUN dnf -y install R-core-devel && dnf clean all RUN dnf -y --setopt='tsflags=' reinstall R-core R-core-devel # need tsflags thing to get docs to install, to get devtools to be able to install RUN dnf install -y openssl-devel harfbuzz-devel fribidi-devel libcurl-devel libxml2-devel proj-devel geos-devel gdal-devel freetype-devel libpng-devel libtiff-devel libjpeg-turbo-devel RUN Rscript -e 'options(warn=2L); install.packages("devtools", repos="https://ftp.osuosl.org/pub/cran/")' RUN mkdir /fakepkgfordeps COPY covidcast/R-packages/covidcast/DESCRIPTION /fakepkgfordeps/DESCRIPTION RUN (cd /fakepkgfordeps && Rscript -e 'devtools::install_dev_deps(repos="https://ftp.osuosl.org/pub/cran/")') ```
bnaras commented 1 year ago

@brookslogan and @capnrefsmmat Here is a Dockerfile that will make it possible to replicate the errors.

Dockerfile ``` # To build from the parent directory: # docker build -t delphi/cran-atlas cran-atlas # # To run: # docker run --rm -ti --name atlas delphi/cran-atlas # Use specified Fedora version (https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-gcc) FROM fedora:36 RUN dnf -y update && dnf clean all ## This gets us a bit there, not fully. RUN dnf -y install dnf-utils # https://docs.rstudio.com/resources/install-r-source/ RUN dnf builddep -y R RUN dnf install -y \ git-all \ R-core \ R-core-devel \ bzip2 \ atlas-devel \ java-11-openjdk-devel \ openssl-devel \ libcurl-devel \ libxml2-devel \ udunits2-devel \ cairo-devel \ readline-devel \ gdal-devel \ proj-devel \ geos-devel \ sqlite-devel \ qpdf-devel \ rsync ## Copy Rdevel COPY R-devel.tar.gz bdr-config.site /tmp RUN cd /tmp && tar -xzf R-devel.tar.gz ## Append config.site RUN cat /tmp/bdr-config.site >> /tmp/R-devel/config.site ## Build and install according the standard 'recipe' old recipe... RUN cd /tmp/R-devel \ && ./tools/rsync-recommended \ && R_PAPERSIZE=letter \ R_BATCHSAVE="--no-save --no-restore" \ R_BROWSER=xdg-open \ PAGER=/usr/bin/pager \ PERL=/usr/bin/perl \ R_UNZIPCMD=/usr/bin/unzip \ R_ZIPCMD=/usr/bin/zip \ R_PRINTCMD=/usr/bin/lpr \ LIBnn=lib \ AWK=/usr/bin/awk \ ./configure --enable-R-shlib \ --without-blas \ --without-lapack \ --with-readline \ --program-suffix=devel \ --enable-lto \ && make \ && make install ## Rename development version of R RUN cd /usr/local/bin \ && mv R Rdevel \ && mv Rscript Rscriptdevel \ && ln -s Rdevel RD \ && ln -s Rscriptdevel RDscript ## Set CRAN repo prefs RUN echo 'options(repos = c(CRAN = "https://cloud.r-project.org"))' >> ~/.Rprofile ## Install devtools RUN RDscript -e 'install.packages("devtools")' ## Download existing covidcast on CRAN for installing dep packages RUN mkdir /tmp/covidcast \ && cd /tmp/covidcast \ && curl -O https://raw.githubusercontent.com/cmu-delphi/covidcast/main/R-packages/covidcast/DESCRIPTION \ && RDscript -e 'devtools::install_dev_deps()' ```
bdr-config.site file ``` CFLAGS="-g -O2 -Wall -pedantic -mtune=native -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong -fstack-clash-protection -fcf-protection -Werror=implicit-function-declaration -Wstrict-prototypes" FFLAGS="-g -O2 -mtune=native -Wall -pedantic" CXXFLAGS="-g -O2 -Wall -pedantic -mtune=native -Wno-ignored-attributes -Wno-parentheses -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong -fstack-clash-protection -fcf-protection" JAVA_HOME=/usr/lib/jvm/java-11 AR=gcc-ar RANLIB=gcc-ranlib LTO=-flto=10 ```
Instructions Reproduce the Atlas errors as follows. - Download the CRAN version to your current directory ``` RDScript -e 'download.packages("covidcast", ".")' ``` - Check this the usual way and it pass with just notes. ``` RD CMD check covidcast_0.4.3.tar.gz ``` - Now do the following as BDR says, by linking libRblas.so to atlas single thread library: ``` mv /usr/local/lib/R/lib/libRblas.so /usr/local/lib/R/lib/libRblas.so.orig ln -s /usr/lib64/atlas/libsatlas.so /usr/local/lib/R/lib/libRblas.so ``` - Now check package again: ``` RD CMD check covidcast_0.4.3.tar.gz ``` You will reproduce the errors. He gets 5 test failures, but you will get 2 test failures and 3 more skipped tests than he does. Which makes for the same total. That's usually because of an environment variable causing some tests to be skipped.
capnrefsmmat commented 1 year ago

I've received the official email from Brian Ripley that we must fix this issue by January 9 to keep covidcast on CRAN. I should have some time today to try the Docker image and test my theory for the root cause, and then we can figure out the right solution.

capnrefsmmat commented 1 year ago

I've reproduced the test failures following @bnaras's instructions, so hopefully I can track this down today.

capnrefsmmat commented 1 year ago

Stepping through line-by-line and running sf::st_is_valid() to check the polygons, I've verified the problem comes from the rotation step in shift_pr():

shift_pr <- function(map_df) {
  pr_df <- map_df %>% dplyr::filter(.data$is_pr)
  pr_df <- sf::st_transform(pr_df, final_crs)
  pr_shift <- sf::st_geometry(pr_df) + c(-0.9e+6, 1e+6)
  pr_df <- sf::st_set_geometry(pr_df, pr_shift)
  r <- 16 * pi / 180
  rotation <- matrix(c(cos(r), sin(r), -sin(r), cos(r)), nrow = 2, ncol = 2)
  pr_rotate <- (sf::st_geometry(pr_df)) * rotation
  pr_df <- sf::st_set_geometry(pr_df, pr_rotate)

  # Pretend this was in final_crs all along
  suppressWarnings({
    sf::st_crs(pr_df) <- final_crs
  })
  return(pr_df)
}

The issue is actually an exception inside GEOS, so I don't think we can suppress our way out of it. I assume that the matrix rotation is somehow resulting in bad polygons because of how ATLAS does the linear algebra. I'll look into alternatives.

capnrefsmmat commented 1 year ago

I submitted the fixed version to CRAN, along with an additional fix to the scales shown on bubble plots. The CRAN maintainers have indicated it's "on its way to CRAN", so we should be good. I'll keep this issue open until the new version is out and I update our pkgdown documentation site.