Colossus Test Failure on CRAN

ericgiunta commented 3 weeks ago

Currently Colossus has been archived from CRAN for failing tests on their "r-devel-linux-x86_64-fedora-clang" machine. Version 1.1.1 is the most recent CRAN release which did not fail tests on the fedora-clang system. These tests have not failed on any other systems, and I am in the process of identifying a system which can reproduce these errors which can be used for debugging.

For the foreseeable future Colossus can either be downloaded and manually installed from Github or the CRAN archives.

For reference, the latest CRAN version (1.1.4.1 equivalent to the github version 1.1.3.2) failed the following tests: ══ Failed tests ════════════════════════════════════════════════════════════════ ── Failure ('test-Cox_Regression.R:211:5'): Coxph loglin_plin_A ──────────────── e$beta_0 (actual) not equal to c(0.11, 1.01) (expected).

  `actual`: 0.03 0.63
`expected`: 0.11 1.01
── Failure ('test-Cox_Regression.R:271:5'): Coxph loglin_M Strata ──────────────
e$beta_0 (`actual`) not equal to c(-0.106) (`expected`).

  `actual`: -0.10
`expected`: -0.11
── Failure ('test-Cox_Regression.R:290:5'): Coxph loglin_M Single ──────────────
e$AIC (`actual`) not equal to 1056.299 (`expected`).

  `actual`: 2109
`expected`: 1056
── Failure ('test-Cox_Regression.R:432:5'): Coxph censoring weight ─────────────
e0$LogLik - e1$LogLik (`actual`) not equal to -2.909427 (`expected`).

  `actual`: -126
`expected`:   -3
── Failure ('test-Poisson_Regression.R:164:5'): Pois loglin_M Strata ───────────
e$beta_0 (`actual`) not equal to c(0.05476188) (`expected`).

  `actual`: 0.3
`expected`: 0.1

[ FAIL 5 | WARN 0 | SKIP 0 | PASS 1035 ]
Error: Test failures
Execution halted

rsbivand commented 3 weeks ago

@ericgiunta thanks. Do you have the complete output, showing which check platform failed and all of its specs - I'm looking at https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-clang? I understand that it was fedora clang and r-devel, but the complete documentation reported from running check would be useful. On Fedora 40 with standard clang 18.1.8, but R built with gfortran, not flang, I get the following check log: 00check.log. The install output suggests future problems as the build train evolves - warnings in compiling for variables set but not used from RcppEigen, probably from -Wall -pedantic in the CFLAGS: (here a zipfile to permit upload here: 00install.zip).

I'll comment more when I see the full failing check log.

ericgiunta commented 3 weeks ago

I appreciate you looking into it further. Unfortunately I did not think to save the check.log prior to it being archived. I have check logs from different rhub systems, which I've been comparing for unused variables. It looks like they all have similar warnings of RcppEigen variables being assigned but not used. I'm setting up a fedora 36 virtual machine using clang/flang that should hopefully reproduce the testing errors. When I get it running and check tests, I will upload the check log.

llrs commented 3 weeks ago

I find suspicious that beyond the failed test it has some large block of text:

Version: 1.1.4.1
Check: tests
Result: ERROR
    Running ‘spelling.R’
    Running ‘testthat.R’ [214s/363s]
  Running the tests in ‘tests/testthat.R’ failed.
  Complete output:
    > # This file is part of the standard setup for testthat.
    > # It is recommended that you do not modify it.
    > #
    > # Where should you do additional test configuration?
    > # Learn more about the roles of various files in:
    > # * https://r-pkgs.org/tests.html
    > # * https://testthat.r-lib.org/reference/test_package.html#special-files
    > 
    > library(testthat)
    > library(Colossus)
    > 
    > test_check("Colossus")
    Error: no events
    Error: Plot type not given
    Error: Stratification Column not in dataframe
    Error: Stratification Column not given
    Error: dose column not given
    Error: dose column not given
    Note: Starting Plot Function
    Note: Getting Plot Info
    Note: 2 risk groups
    Error: ID column is not in the dataframe
    Note: Starting Plot Function
    Note: Getting Plot Info
    Note: 2 risk groups
    Error: ID column not given
    Note: 3 risk groups
    Warning: element 1 with column name d was set constant
    Error: Atleast one parameter must be free
    Note: 3 risk groups
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Note: 3 risk groups
    Note: 2  strata used
    Note: 2  strata used
    Note: 3 risk groups
    Saving 7 x 7 in image
    Note: 3 risk groups
    Saving 7 x 7 in image
    Note: 86 risk groups
    Note: 86 risk groups
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Note: 2 risk groups
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Note: Starting Plot Function
    Note: Getting Plot Info
    Note: 3 risk groups
    Note: 3 risk groups
    Note: starting ph_plot
    Note: nonStratified survival curve calculation
    Note: writing survival data
    Note: Plotting Survival Curves
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Note: Plotting Kaplan-Meier Curve
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Error: Cores Requested: 124 , Cores Available: 2
    Error: e not in data.table
    Error: e not in data.table
    Error: Iteration: a?+?b?+c has incorrect length of 4 but should be 3.
    Error: Incorrect operation of ++
    Error: Terms used: 2 , Terms with gmix types available: 1
    Error: Model formula FAILING_CHOICE not in acceptable list
    Error: Model formula MA not in acceptable list
    Error: Model formula EA not in acceptable list
    Error: Interpolation method not recognized: badbad
    Error: a0 arguement was negative
    Error: a0 arguement was not a number
    Error: a0 arguement was not a number
    Error: a0 arguement was negative
    Error: a0 arguement was not a number
    Error: a0 arguement was not a number
    Error: a0 arguement was not a number
    Error: goal is too low
    Error: a0 arguement was negative
    Error: e missing from column names
    Error: The missing-value replacement is also NA
    Warning: Parameters used: 4, Covariates used: 5, Remaining filled with 0.01
    Error: Atleast one parameter must be free
    Error: Parameters used: 6, Covariates used: 5
    Error: Terms used: 4, Covariates used: 5
    Error: Terms used: 6, Covariates used: 5
    Error: Term types used: 4, Covariates used: 5
    Error: Term types used: 6, Covariates used: 5
    Error: lin_int missing
    Error: step_int missing
    Error: loglin_top missing
    Error: lin_quad_int missing
    Error: lin_exp_int missing
    Error: lin_exp_exp_slope missing
    Error: step_slope missing
    Error: lin_slope missing
    Error: lin_quad_slope missing
    Error: lin_exp_slope missing
    Error: lin_exp_int missing
    Error: Parameters used in first option: 5, Parameters used in different option: 4, please fix parameter length
    Warning: Parameters used: 4, Covariates used: 5, Remaining filled with 0.01
    Error: Parameters used: 6, Covariates used: 5
    Error: Terms used: 4, Covariates used: 5
    Error: Terms used: 6, Covariates used: 5
    Error: Term types used: 4, Covariates used: 5
    Error: Term types used: 6, Covariates used: 5
    Error: verbosity arguement not valid
    Error: verbosity arguement not valid
    Error: verbosity arguement not valid
    Error: keep_constant expects 0/1 values, minimum value was -1
    Error: keep_constant expects 0/1 values, maximum value was 10
    Error: keep_constant expects 0/1 values, atleast one value was noninteger
    Warning: term_n expects nonnegative integer values and a minimum of 0, minimum value was -1. Minimum value set to 0, others shifted by 1
    Error: term_n expects integer values, atleast one value was noninteger
    Error: term_n expects no missing integer values. Term numbers range from 0 to 3 but term_n has 3 unique values instead of 4
    Warning: term_n expects nonnegative integer values and a minimum of 0, minimum value was -1. Minimum value set to 0, others shifted by 1
    Error: Missing tform items:
    missing fake 
    Warning: term_n expects nonnegative integer values and a minimum of 0, minimum value was -1. Minimum value set to 0, others shifted by 1
    Error: Missing tform items:
    missing fake missing bad 
    Error: Default Parameters used: 7, Covariates used: 4
    Error: Terms used: 1, Covariates used: 4
    Error: Terms used: 8, Covariates used: 4
    Error: Term types used: 1, Covariates used: 4
    Error: Term types used: 5, Covariates used: 4
    Error: Atleast one parameter must be free
    Error: no events
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Error: tform not implemented bad_bad
    Error: der_iden should be within 0:(length(tform)-1)
    Error: der_iden should be within 0:(length(tform)-1)
    Error: Constraint matrix has incorrect number of columns
    Error: Constraint rows and constant lengths do not match
    Error: verbosity arguement not valid
    Warning: model covariate order changed
    Error: Atleast one parameter must be free
    Warning: model covariate order changed
    Error: no events
    Warning: model covariate order changed
    Warning: model covariate order changed
    Error: guesses: 100 , iterations per guess: c(1, 1)
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Warning: term_n expects nonnegative integer values and a minimum of 0, minimum value was 1. Minimum value set to 0, others shifted by -1
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Warning: term_n expects nonnegative integer values and a minimum of 0, minimum value was 1. Minimum value set to 0, others shifted by -1
    Error: Atleast one parameter must be free
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Error: guesses_control and model_control have different strata options
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Error: Atleast one parameter must be free
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Note: 2 risk groups
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Note: 908 risk groups
    Saving 7 x 7 in image
    Error: Atleast one parameter must be free
    Error: no events
    Note: 3 risk groups
    Saving 7 x 7 in image
    Warning: model covariate order changed
    Note: 26 risk groups
    Error: Atleast one parameter must be free
    Warning: model covariate order changed
    Warning: model covariate order changed
    Note: 26 risk groups
    Note: Initial starts: 1 , Number of iterations provided: 4 . Colossus requires one more iteration counts than number of guesses (for best guess)
    Warning: model covariate order changed
    Note: 26 risk groups
    Note: Initial starts: 3 , Number of iterations provided: 2 . Colossus requires one more iteration counts than number of guesses (for best guess)
    Warning: model covariate order changed
    Note: 26 risk groups
    Warning: model covariate order changed
    Note: 26 risk groups
    Warning: model covariate order changed
    Note: 26 risk groups
    Error: Invalid risk
    Warning: model covariate order changed
    Error: Atleast one parameter must be free
    Warning: model covariate order changed
    Error: no events
    Warning: model covariate order changed
    Note: Initial starts: 1 , Number of iterations provided: 4 . Colossus requires one more iteration counts than number of guesses (for best guess)
    Warning: model covariate order changed
    Note: Initial starts: 3 , Number of iterations provided: 2 . Colossus requires one more iteration counts than number of guesses (for best guess)
    Warning: model covariate order changed
    Error: guesses: 50 , iterations per guess: c(1, 1)
    Warning: model covariate order changed
    Warning: model covariate order changed
    Warning: model covariate order changed
    Error: Invalid risk
    Error: Atleast one parameter must be free
    Error: Atleast one parameter must be free
    Warning: model covariate order changed
    Note: 2 risk groups
    Saving 7 x 7 in image
    Saving 7 x 7 in image
    Error: no events
    Error: Atleast one parameter must be free
    Warning: rmin/rmax not equal size, lin/loglin min/max used
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Note: 86 risk groups
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Note: 86 risk groups
    Error: guesses_control and model_control have different strata options
    Note: 2  strata used
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Note: 2  strata used
    Note: 86 risk groups
    Error: Atleast one parameter must be free
    Error: no events
    Error: Default Parameters used: 4, Covariates used: 6
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Error: tform not implemented Not
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Error: verbosity arguement not valid
    Error: verbosity arguement not valid
    Error: a0 arguement was not a number
    Error: a0 arguement was not a number
    Error: a0 arguement was not a number
    Error: item e1 in name_list has 1 items, but same item in term_n_list has 6 items. Omit entry in term_n_list to set to default of term 0 or add missing values
    Error: item e1 in name_list has 1 items, but same item in tform_list has 6 items. Omit entry in tform_list to set to default of 'loglin' or add missing values
    Error: item e1 in name_list has 1 items, but same item in keep_constant_list has 6 items. Omit entry in tform_list to set to default of 0 or add missing values
    Error: item e1 in name_list has 1 items, but same item in a_n_list has 6 items. Omit entry in a_n_list to set to default of 0 or add missing values
    Saving 7 x 7 in image
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Error: Invalid model
    Note: 26 risk groups
    Error: Invalid model
    Note: 26 risk groups
    Error: Invalid model
    Note: 26 risk groups
    Error: Invalid model
    Error: Invalid model
    Error: Invalid model
    Note: 1 risk groups
    Note: 1 risk groups
    Note: 99 risk groups
    Note: 99 risk groups
    Note: 2  strata used
    Note: 99 risk groups
    Note: 2  strata used
    Note: 99 risk groups
    [1] "starting"
    Note: 99 risk groups
    Error: 2  column indexes provided, but  1  rows of realizations columns provided
    Note: 99 risk groups
    Error: Atleast one realization column provided was not in the data.table
    Note: 99 risk groups
    Error: bad missing from column names
    Error: no events
    Note: 99 risk groups
    non-derivative model calculation is not compatable with multi-realization method
    Error: Invalid model
    Note: 99 risk groups
    null model is not compatable with multi-realization method
    Error: Invalid model
    Note: 114 risk groups
    Note: 114 risk groups
    Note: 114 risk groups
    Note: 114 risk groups
    Note: 114 risk groups
    Note: 114 risk groups
    Note: 114 risk groups
    Note: 114 risk groups
    Note: 114 risk groups
    Note: 2  strata used
    Note: 114 risk groups
    Note: 2  strata used
    Note: 114 risk groups
    Note: 2  strata used
    Note: 114 risk groups
    Note: 2  strata used
    Note: 114 risk groups
    Note: 2  strata used
    Note: 114 risk groups
    Note: 2  strata used
    Note: 114 risk groups
    Note: 2  strata used
    Note: 114 risk groups
    Note: 2  strata used
    Note: 114 risk groups
    Note: 2  strata used
    Note: 114 risk groups
    Note: 901 risk groups
    Saving 7 x 7 in image
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 5  strata used
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Note: 26 risk groups
    Warning: model covariate order changed
    Warning: model covariate order changed
    Warning: model covariate order changed
    Warning: model covariate order changed
    Warning: model covariate order changed
    Warning: model covariate order changed
    Warning: model covariate order changed
    Warning: model covariate order changed
    [ FAIL 5 | WARN 0 | SKIP 0 | PASS 1035 ]

    ══ Failed tests ════════════════════════════════════════════════════════════════
    ── Failure ('test-Cox_Regression.R:211:5'): Coxph loglin_plin_A ────────────────
    e$beta_0 (`actual`) not equal to c(0.11, 1.01) (`expected`).

      `actual`: 0.03 0.63
    `expected`: 0.11 1.01
    ── Failure ('test-Cox_Regression.R:271:5'): Coxph loglin_M Strata ──────────────
    e$beta_0 (`actual`) not equal to c(-0.106) (`expected`).

      `actual`: -0.10
    `expected`: -0.11
    ── Failure ('test-Cox_Regression.R:290:5'): Coxph loglin_M Single ──────────────
    e$AIC (`actual`) not equal to 1056.299 (`expected`).

      `actual`: 2109
    `expected`: 1056
    ── Failure ('test-Cox_Regression.R:432:5'): Coxph censoring weight ─────────────
    e0$LogLik - e1$LogLik (`actual`) not equal to -2.909427 (`expected`).

      `actual`: -126
    `expected`:   -3
    ── Failure ('test-Poisson_Regression.R:164:5'): Pois loglin_M Strata ───────────
    e$beta_0 (`actual`) not equal to c(0.05476188) (`expected`).

      `actual`: 0.3
    `expected`: 0.1

    [ FAIL 5 | WARN 0 | SKIP 0 | PASS 1035 ]
    Error: Test failures
    Execution halted
Flavor: [r-devel-linux-x86_64-fedora-clang](https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-fedora-clang/Colossus-00check.html)

Version: 1.1.4.1
Check: installed package size
Result: NOTE
    installed size is 67.2Mb
    sub-directories of 1Mb or more:
      libs  66.1Mb
Flavors: [r-release-macos-arm64](https://www.r-project.org/nosvn/R.check/r-release-macos-arm64/Colossus-00check.html), [r-oldrel-macos-arm64](https://www.r-project.org/nosvn/R.check/r-oldrel-macos-arm64/Colossus-00check.html)

Version: 1.1.3.1
Check: installed package size
Result: NOTE
    installed size is 83.0Mb
    sub-directories of 1Mb or more:
      libs  82.1Mb
Flavors: [r-release-macos-x86_64](https://www.r-project.org/nosvn/R.check/r-release-macos-x86_64/Colossus-00check.html), [r-oldrel-macos-x86_64](https://www.r-project.org/nosvn/R.check/r-oldrel-macos-x86_64/Colossus-00check.html)

Many of those are errors or warnings, perhaps test failures are related to this:

    [1] "starting"
    Note: 99 risk groups
    Error: 2  column indexes provided, but  1  rows of realizations columns provided
    Note: 99 risk groups
    Error: Atleast one realization column provided was not in the data.table
    Note: 99 risk groups
    ...
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Note: 2  strata used
    Note: 86 risk groups
    Error: Atleast one parameter must be free
    Error: no events
    Error: Default Parameters used: 4, Covariates used: 6
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Error: tform not implemented Not
    Warning: rmin and rmax lists not equal size, defaulting to lin and loglin min/max values
    Error: verbosity arguement not valid
    Error: verbosity arguement not valid
    Error: a0 arguement was not a number
    Error: a0 arguement was not a number
    Error: a0 arguement was not a number
    Error: item e1 in name_list has 1 items, but same item in term_n_list has 6 items. Omit entry in term_n_list to set to default of term 0 or add missing values
    Error: item e1 in name_list has 1 items, but same item in tform_list has 6 items. Omit entry in tform_list to set to default of 'loglin' or add missing values
    Error: item e1 in name_list has 1 items, but same item in keep_constant_list has 6 items. Omit entry in tform_list to set to default of 0 or add missing values
    Error: item e1 in name_list has 1 items, but same item in a_n_list has 6 items. Omit entry in a_n_list to set to default of 0 or add missing values

rsbivand commented 3 weeks ago

Here is my similar testthat.Rout but with no failure, from clang version 18.1.8 (Fedora 18.1.8-1.fc40) R Under development (unstable) (2024-09-23 r87189) compressed: testthat.zip. @llrs were you using clang 19?

llrs commented 3 weeks ago

@rsbivand that information is available on CRAN: https://cran-archive.r-project.org/web/checks/2024/2024-09-23_check_results_Colossus.html . I haven't tried to build the images with rhub as it has fedora but not with clang and I don't know enough about clang19 to modify the docker image it uses.

rsbivand commented 3 weeks ago

OK, thanks. I didn't know that the internal check results were world-readable, this will make my life (summing up the retirement project for rgdal, rgeos and maptools https://r-spatial.github.io/evolution/) so much easier! So the clang19 instance we have now is that from CRAN. I've built clang19 from source but on Fedora 40 (current), I'll see whether I can replicate later today or tomorrow.

ericgiunta commented 3 weeks ago

@llrs the various error and warning messages are from other tests. I built in tests to check that errors/warning occur when they should. I use codecov to check that my tests are hitting lines of code, which includes the message() statements.

llrs commented 2 weeks ago

@ericgiunta It is great that you cover 96% of the src (I would also recommend to cover the R folder). codecov, or testthat or R, doesn't need to print text into the terminal to cover a given line. But since we have a log of the tests, besides the testthat report, I would use that. Have you compared it with the log of your tests when it passes the checks?

If these warnings and errors are not expected, they will help us. If they are expected, you could use test fixtures in testthat parlance to check they are the same between runs and avoid printing them on the reports. I hope this helps.

rsbivand commented 2 weeks ago

@llrs Agreed, test output should be self-eplanatory, so an expected error or warning should not be reported, only an unexpected change from specification. Since other users read these files, for example during reverse dependency checks, keeping them as short as possible is of importance. This didn't cause the problem, but is a potential source of confusion for CRAN and package maintainers running reverse dependency checks.

So far, I've built and installed clang 19 from source following https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-clang as far as I can tell, but on Fedora 40. There was a problem with libunwind-devel (which had been absent), and clang itself didn't pass its unit tests. I then built R-devel, which failed unit tests on mgcv:

Error: package or namespace load failed for ‘mgcv’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/home/rsb/topics/R/trunk/builddir-clang/library/mgcv/libs/mgcv.so':
  /home/rsb/topics/R/trunk/builddir-clang/library/mgcv/libs/mgcv.so: undefined symbol: __kmpc_dispatch_deinit, version VERSION
Error: loading failed

There was no trouble with Fedora 40 standard clang 18.1.8.

Trying R CMD check with Colossus_1.1.4.1.tar.gz and clang 19 led to the same installation error that mgcv faced. In my case, this suggests that openmp is not aligned with clang. openmp seems to be quite volatile, and I'm unsure how to proceed. Any input would be useful.

Followup: On my laptop it turns out that I do not have openmpi installed, and if my desktop is the same, possibly clang was built without it. Will pursue this later. I always avoid lower-level parallel work, preferring upper-level.

rsbivand commented 2 weeks ago

My desktop was not the same, openmp is installed. However, the clang19 openmp is in another folder, so after adding that to what /sbin/ldconfig sees, mgcv installs and passes CMD check. As does Colossus_1.1.4.1.tar.gz: 00check.log, 00install.zip.

R-devel is failing a regression test that one timing is not less than another between stats::dist and stats::hclust, but that is no longer relevant here.

Unfortunately, I cannot reproduce the original report with clang19 on Fedora 40, but only with --as-cran settings. Now with environment variable settings from https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-clang, also OK: 00check.log. So no clear signs, nor any way of seeing how to alleviate the problem other than rewriting the tests to avoid hard failures.

rsbivand commented 2 weeks ago

After re-checking the library paths, so thar R is definitely built against the clang19 openmpi, R passes its own tests cleanly. However, Colossus_1.1.4.1.tar.gz still passes.

ericgiunta commented 2 weeks ago

I appreciate the feedback on the tests. I'm new to writing R packages, so I've been trying to learn the various best practices. I'll make a note to update how the errors and warnings are displayed. Getting codecov to check the R folder has been on my list of minor issues. The codecov checks have been mainly to verify that the C++ code is being checked for memory access errors.

It is very good to know that Fedora 40 with clang isn't failing the tests. I've been slowly getting a Fedora 36 virtual machine setup to test with. I'll see if its possibly something specific to fedora 36, which was changed in later versions of fedora.

rsbivand commented 2 weeks ago

There is no certainty that I've got my F40 platform with clang19 set up correctly either, I'm afraid, I only install a compiler/build-train every couple of years, unlike CRAN-team, who do it daily. I doubt that F36 would make any difference; when you set up the expected values in the test, what was the context?

ericgiunta commented 2 weeks ago

The expected values are based on the what the code returns on my main system. There is not supposed to be any random elements to the return, so the result should be the same for any system. Most of the expected value tests are running one iteration of a regression, so it tests the different calculation functions and function which estimates the better parameter set. So the results could change if there is some difference in parallelization, how column operations are being performed, or how variables are being interpreted.

rsbivand commented 2 weeks ago

Could you see whether any such manipulation, such as the number of threads or any other parameter might change the output of the test? The F36 settings I think set the threads to 2, but trying to see how those outputs might occur would throw light on what may be going on.

ericgiunta commented 2 weeks ago

That was my plan. Once I have a system that reproduces the errors that I can start by tracking down what function is having issues. From there I can disable the threading, print out more intermediate results, and figure out what lines are having issues.

When you configured the R installation, how did you point the r-devel configuration to the clang/flang builds? The config.site options in the system details don't seem to be enough for it to locate the new clang/flang installations. I built clang/flang from LLVM and then added clang18/bin to my path. I am able to check versions of clang and flang-new, but when I try to configure R I'm getting an error that clang cannot create executables.

Nevermind, I just needed to set CC and FC to the names of the executables.

rsbivand commented 2 weeks ago

See the settings in https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-clang:

config.site: 
CFLAGS="-O3 -Wall -pedantic -Wp,-D_FORTIFY_SOURCE=3" 
FFLAGS="-O2 -pedantic" 
CXXFLAGS="-O3 -Wall -pedantic -frtti -Wp,-D_FORTIFY_SOURCE=3" 
CPPFLAGS="-isystem /usr/local/clang/include" 
LDFLAGS="-L/usr/local/clang/lib64 -L/usr/local/clang18/lib -L/usr/local/lib64"

and add the library locations in the files read by /sbin/ldconfig, I think under /etc/ld.so.conf.d as superuser, running ldconfig when written. Do check that the library locations match, those given there suit the CRAN check platform.

ericgiunta commented 1 week ago

After setting up a fedora-36 virtual machine compiling R with clang/flang 18, I think the issue is coming from the OpenMP sections. Running tests directly from the command line found no issues, but running "R CMD check" caused two tests to fail. Upon investigation, the tests failed when a single section was run in parallel. I was not able to reproduce every testing failure, but I suspect they tests failed during the CRAN testing failed for similar reasons.

I compared the last working version on CRAN to the current version and realized the last working version on CRAN wasn't using OpenMP for linux. At the time I was figuring out how to use the $(SHLIB) options, and had left it out of the makevars file. The last working CRAN version set the makevars file to: 'PKG_LIBS = $(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()" $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS)\n'

This was updated on the Github version to: 'PKG_LIBS = $(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()" $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) $(SHLIB_OPENMP_CXXFLAGS) \nPKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) \n'

I am continuing to look into how to reproduce all of the CRAN testing failures, but as things stand I suspect its an issue with the parallelization on this system.

llrs commented 1 week ago

There have been a lot of discussions in r-package-devel mailing list about parallel process not playing well between each other, a search in the archive might help you.

With OpenMP, it is recommended to test/check enviromental variable OMP_THREAD_LIMIT and OMP_NUM_THREADS, see the last 4-5 paragraphs of the linked version.

ericgiunta / Colossus

Colossus Test Failure on CRAN #14