Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.62k stars 985 forks source link

revdep rerun #3581

Closed mattdowle closed 5 years ago

mattdowle commented 5 years ago

Rerun with latest master to see the impact of #2734. @MarkusBonsch was very careful to make no breaking changes. Then I continued in the PR and did change a few things; e.g. now retaining the i column type when mismatch types are joined.

As of Sep 4th
CRAN 
 ERROR   :   8 : antaresProcessing CoSMoS grattan mlr rbi recorder tcpl trackdf 
 WARNING :   2 : OpenML vosonSML 
 NOTE    : 237 
 OK      : 489 
 TOTAL   : 736 / 736

fail.log

Full rerun Sep 8th
CRAN:
 ERROR   :   7 : batchtools genderizeR genomic.autocorr miceFast musica parallelMap tsbox 
 WARNING :   4 : MultiFit optiSel riskRegression sensobol 
 NOTE    : 238 
 OK      : 488 
 TOTAL   : 737 / 737 

fail.log

Full rerun overnight PT Sep 11th
CRAN:
 ERROR   :   1 : genderizeR 
 WARNING :   2 : optiSel SpaDES.core 
 NOTE    : 238 
 OK      : 497 
 TOTAL   : 738 / 738 
BIOC:
 ERROR   :   8 : CAGEr cellbaseR ENCODExplorerData GENESIS ImmuneSpaceR LowMACA qckitfastq singleCellTK 
 WARNING :  31 : AUCell BASiCS BEARscc BiocParallel CellNOptR CONFESS CytoML ELMER eQTL flowWorkspace geneXtendeR GenoGAM genomation ggcyto HMMcopy iCNV maser methylPipe MinimumDistance MSnID MSstats netSmooth openCyto QUALIFIER RegParallel RiboProfiling S4Vectors SISPA TFutils TitanCNA Ularcirc 
 NOTE    : 110 
 OK      :  35 
 TOTAL   : 184 / 184

37 of the 39 Bioc are also error/warning with v1.12.2. So those are unrelated to data.table. This is the log for those 37 : fail.log 39 - 37 = 2 which could be related to data.table :

> status()
Installed data.table to be tested against: 1.12.3 2019-09-11 19:16:02 
CRAN:
 ERROR   :   1 : genderizeR 
 WARNING :   2 : optiSel SpaDES.core 
 NOTE    : 238 
 OK      : 497 
 TOTAL   : 738 / 738 

BIOC:
 ERROR   :   7 : CAGEr cellbaseR ENCODExplorerData GENESIS ImmuneSpaceR qckitfastq singleCellTK 
 WARNING :  30 : AUCell BASiCS BEARscc BiocParallel CellNOptR CONFESS CytoML ELMER eQTL flowWorkspace geneXtendeR GenoGAM genomation ggcyto HMMcopy iCNV methylPipe MinimumDistance MSnID MSstats netSmooth openCyto QUALIFIER RegParallel RiboProfiling S4Vectors SISPA TFutils TitanCNA Ularcirc 
 NOTE    : 111 
 OK      :  36 
 TOTAL   : 184 / 184 

TOTAL          : 922 

Dear maintainers,

We're working on releasing data.table 1.12.4 and have run R CMD check on your package with the new version to check the impact. But your package is already showing an error or warning with the current release 1.12.2. I think these are unrelated to data.table but it makes my job harder when these packages are already in error/warning status. In some cases I have been emailing you for several years about this.

The reverse dependency checking process is logged and discussed here: https://github.com/Rdatatable/data.table/issues/3581

Log attached for these 37 packages: fail.log

CAGEr             "Vanja Haberle"                  
cellbaseR         "Mohammed OE Abdallah"           
ENCODExplorerData "Eric Fournier"                  
GENESIS           "Stephanie M. Gogarten"          
ImmuneSpaceR      "ImmuneSpace Package Maintainer" 
qckitfastq        "August Guang"                   
singleCellTK      "David Jenkins"                  
AUCell            "Sara Aibar"                     
BASiCS            "Catalina Vallejos"              
BEARscc           "Benjamin Schuster-Boeckler"     
BiocParallel      "Bioconductor Package Maintainer"
CellNOptR         "A.Gabor"                        
CONFESS           "Diana LOW"                      
CytoML            "Mike Jiang"                     
ELMER             "Tiago Chedraoui Silva"          
eQTL              "Vincent Carey"                  
flowWorkspace     "Greg Finak"                     
geneXtendeR       "Bohdan Khomtchouk"              
GenoGAM           "Georg Stricker"                 
genomation        "Altuna Akalin"                  
ggcyto            "Mike Jiang"                     
HMMcopy           "Daniel Lai"                     
iCNV              "Zilu Zhou"                      
methylPipe        "Kamal Kishore"                  
MinimumDistance   "Robert B Scharpf"               
MSnID             "Vlad Petyuk"                    
MSstats           "Meena Choi"                     
netSmooth         "Jonathan Ronen"                 
openCyto          "Mike Jiang"                     
QUALIFIER         "Mike Jiang"                     
RegParallel       "Kevin Blighe"                   
RiboProfiling     "A. Popa"                        
S4Vectors         "Bioconductor Package Maintainer"
SISPA             "Bhakti Dwivedi"                 
TFutils           "Shweta Gopaulakrishnan"         
TitanCNA          "Gavin Ha"                       
Ularcirc          "David Humphreys"

And the 3 CRAN packages :

Dear 3 maintainers,

I'm working on releasing data.table 1.12.4 to CRAN and checking all packages which use it (922 reverse dependencies including Bioconductor) to check for any impact. Your package is already in warning or error status on CRAN with the last release (1.12.2). So it's harder for me to spot errors or warnings that the data.table update causes when your package is already showing error or warnings. Please could you fix these and update on CRAN.

https://cran.r-project.org/web/checks/check_results_genderizeR.html https://cran.r-project.org/web/checks/check_results_optiSel.html https://cran.r-project.org/web/checks/check_results_SpaDES.core.html

For background info, and your entertainment, the revdep check process is logged here: https://github.com/Rdatatable/data.table/issues/3581

There's no rush, and this isn't holding up release.

Thanks, Matt

MichaelChirico commented 5 years ago

genderizeR gives the same errors on CRAN data.table. I see what looks like a bug in the code; have filed: https://github.com/kalimu/genderizeR/issues/9

MichaelChirico commented 5 years ago

musica appears related to rbindlist combining IDate column (underlying integer with attribute) and factor:

DT1 = data.table(
  prse = structure(5478L, class = c("IDate", "Date"))
)
DT2 = data.table(
  prse = structure(1L, .Label = c("1970-01-01"), class = "factor")
)
rbind(DT1, DT2)
#          prse
# 1:       5478
# 2: 1970-01-01

although it's the same output as 1.12.2:

rbind(DT1, DT2)
         prse
1:       5478
2: 1970-01-01

still it's something to address

MichaelChirico commented 5 years ago

optiSel appears spurious. WARNING from its own C++ code and NOTE about the package size

MichaelChirico commented 5 years ago

Oh, actually, musica is due to this:

https://github.com/Rdatatable/data.table/pull/3630#issue-285319167

Apparently musica was relying on untested behavior of cut.IDate to return an IDate; now it returns a factor but mean.IDate still returns IDate, hence the mismatch across groups for rbindlist.

Will restore cut.IDate and add some tests...

MichaelChirico commented 5 years ago

I don't think parallelMap has anything to do with us. Can reproduce the test error by cloning their GH & running source('run-all.R') from their tests directory, but the stack trace is:

Browse[1]> f
── 1. Error: batchtools mode (@test_batchtools.R#11)  ──────────────────────────
unused argument (V1 = 1)

1: partest1() at testthat/test_batchtools.R:11
2: expect_equal(parallelMap(identity, 1), list(1)) at /Users/michael.chirico/github/parallelMap/tests/testthat/helpers.R:9
3: quasi_label(enquo(object), label, arg = "object")
4: eval_bare(get_expr(quo), get_env(quo))
5: parallelMap(identity, 1)
6: checkResultsAndStopWithErrorsMessages(res)
7: stopWithJobErrorMessages(inds, vcapply(result.list[inds], as.character))
8: stopf("Errors occurred in %i slave jobs, displaying at most 10 of them:\n\n%s\n%s", 
       n, collapse(msgs, sep = "\n"), extra.msg)

(nothing data.table)

And it's the same with CRAN data.table

mattdowle commented 5 years ago

On parallelMap, it's the CRAN check page to go off and that looks all-OK : https://cran.r-project.org/web/checks/check_results_parallelMap.html. That makes it more likely it is to do with us. parallelMap suggests batchtools and the output mentions test_batchtools.R. So probably fixing batchtools (#3854) will fix parallelMap too then. Great.