Test cases for recreation of difficult CRAN check failures

gmbecker commented 1 year ago

Now that we have the scripts used by CRAN (#17) We need examples of packages with check failures on cran that are difficult to recreate locally with standard R CMD check usage in order to see whether using the specific CRAN checks is helpful in exposing additional issues that submitted packages are running afoul of.

Please add examples of such to the comments

@jeroen

gmbecker commented 1 year ago

@maelle (I believe you knew people who might be able to help us with this if they are willing?)

@hadley

@gaborcsardi

llrs commented 1 year ago

@statnmap With all the golem and fusen problem you posted in Twitter.

From #7 I created a dashboard that compared CRAN checks between two consecutive days and filtered to packages without any change in dependencies and their own version. All the history/packages that had some problem in a given day is collected in the history branch. Note that this does not take into account different R version used.

For longer historical data of CRAN checks changes there was cchecksapi.

statnmap commented 1 year ago

I do have a recent example where {fusen} failed on Debian during the pre-checks: https://win-builder.r-project.org/incoming_pretest/fusen_0.4.1_20220924_231030/Debian/00check.log I add it as PDF here too, to not loose it. https___win-builder.r-project.org_incoming_pretest_fusen_0.4.1_2-job_2.pdf

Before sending it to CRAN, I ran the package with checkhelper::check_as_cran() ( https://github.com/ThinkR-open/checkhelper#experimental-check-as-cran-with-cran-global-variables ) where I retrieved all env. variables as experienced in https://github.com/RConsortium/r-repositories-wg/issues/17 . The package check did not fail locally.

The checks did not fail on any of GitHub Actions either.
The state of the project at that time was : https://github.com/ThinkR-open/fusen/tree/74430136f33e508a1349c2f00d4d3bd2aa27b8f7

After that, I decided to add a skip_on_cran() for this part.
Indeed, I run checks inside checks. The notes that arise at this step usually depend on env. variables that are different than mine. But this time, I do not know which ones.

gmbecker commented 1 year ago

So @statnmap from the output:

 ! The path: /tmp/RtmpZqD1jU/working_dir/RtmpKaNkA4/dummy1fe52b20d4f58f/foosen already
exists.
 x Aborting fusen project creation. Set `create_fusen(overwrite = TRUE)` to avoid a stop.
 ! The path: /tmp/RtmpZqD1jU/working_dir/RtmpKaNkA4/dummy1fe52b20d4f58f/foosen already
exists.

It looks like this is a race condition in a temporary file you're creating, right?

llrs commented 1 year ago

I looked at the intermittent failures data. Since 2022-01-30 the following packages had different results within CRAN checks:

Package	Flavor	Version	n
bayesQR	r-release-linux-x86_64	2.3	4
MicroMoB	r-devel-linux-x86_64-debian-gcc	0.1.0	4
sgsR	r-oldrel-macos-arm64	1.0.0	4
bayesQR	r-devel-linux-x86_64-debian-gcc	2.3	3
GenBinomApps	r-devel-linux-x86_64-debian-gcc	1.1	3
gtExtras	r-oldrel-macos-arm64	0.4.0	3
phreeqc	r-patched-linux-x86_64	3.7.4	3
R.methodsS3	r-devel-linux-x86_64-debian-gcc	1.8.1	3
Rpoppler	r-release-macos-arm64	0.1-0	3
abcrlda	r-release-linux-x86_64	1.0.3	2

These versions might have something that cannot be checked consistently and could serve as a starting point to find packages with check failures on CRAN that are difficult to recreate locally with standard R CMD check. It could be a change in CRAN checks or some test not robust under some conditions, random numbers being tested, flaky urls...

Additionally, CRAN flavors that resulted in checks not able to recreate previous results are:

Flavor	n
r-devel-linux-x86_64-fedora-gcc	290
r-devel-linux-x86_64-fedora-clang	259
r-devel-windows-x86_64	224
r-devel-linux-x86_64-debian-gcc	179
r-release-linux-x86_64	87
r-patched-linux-x86_64	48
r-oldrel-macos-arm64	44
r-release-macos-arm64	32
r-release-macos-x86_64	15
r-devel-linux-x86_64-debian-clang	10

gmbecker commented 1 year ago

@llrs r-devel flavors are probably a no go as a moving target, Possibly the same for patched variants. Need a span of time where the version of R used for the check variant would be constant.

Also, what are the n's there? Is that times it intermittently failed?

gmbecker commented 1 year ago

I took a glance at bayesQR, and it has examples, but neither tests nor vignettes, which in a sense narrows things down somewhat.

statnmap commented 1 year ago

So @statnmap from the output:

 ! The path: /tmp/RtmpZqD1jU/working_dir/RtmpKaNkA4/dummy1fe52b20d4f58f/foosen already
exists.
 x Aborting fusen project creation. Set `create_fusen(overwrite = TRUE)` to avoid a stop.
 ! The path: /tmp/RtmpZqD1jU/working_dir/RtmpKaNkA4/dummy1fe52b20d4f58f/foosen already
exists.

It looks like this is a race condition in a temporary file you're creating, right?

@gmbecker This warning message is expected. I just couldn't capture it in the unit test because I already capture the expect_error() and the other message coming with. The test does not fail here.

    if (!isTRUE(overwrite)) {
      cli::cli_alert_danger(
        paste(
          "Aborting fusen project creation.",
          "Set `create_fusen(overwrite = TRUE)` to avoid a stop."
        )
      )
      stop("Could not create fusen project", call. = FALSE)
    }

What fails here is this line: https://github.com/ThinkR-open/fusen/blob/40cd8df7ca36b4e11efb6ec691cdd50ba38ed908/tests/testthat/test-inflate-part1.R#L182 This comes after a rcmdcheck() not protected from any environment variables (https://github.com/ThinkR-open/fusen/blob/40cd8df7ca36b4e11efb6ec691cdd50ba38ed908/tests/testthat/test-inflate-part1.R#L170). Hence, this uses the env. variables of the session it is included in, which are different on CRAN. I would need to print the output to see which NOTE appeared in this specific case, but to do so, I need to submit to CRAN to get the pre-checks, knowing it will fail.

statnmap commented 1 year ago

@llrs I do not see the WARN coming from Bioconductor installation problems.
This seems to be an intermittent problem, although these days, it seems to be recurrent on "oldrel-windows". And it affects multiple packages (those testing for fake packages from what I see)

llrs commented 1 year ago

@gmbecker I tried tracking the r-devel version but it is not reported on the summary and I didn't parse it from all the checks. But I did extract it from a single package check and extrapolated from there. But this is one of the suggestions I had for the CRAN team: provide in tools::CRAN_check_results() the R version used in checks and the date of the check, as sometimes the r-devel revisions are not the same for all flavors or/and the checks are performed at different dates.

The n is indeed number of status changes not driven by dependencies changes, packages updates or R version changes. It counts any change including from OK to NOTE or from NOTE to OK.

@statnmap I do not focus on which is the cause of the intermittent failures. But missing Bioconductor dependencies is a common message (Error, warning or note depending on the type of dependency): I have a package that has been in CRAN for over a year which usually has a note that some Bioconductor (suggested) dependencies are not available. I think ensuring a good sync between CRAN and Bioconductor would be also a good improvement but might involve talking to Bioconductor Technical Board or core.

I think you can capture warnings and errors on the same test via a nested call: expect_error(expect_warning(function_warning_error()) or something similar. You could also use testthat snapshots to check the output.

RConsortium / r-repositories-wg

Test cases for recreation of difficult CRAN check failures #20