Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.57k stars 975 forks source link

Encourage Imports not Depends #3076

Open mattdowle opened 5 years ago

mattdowle commented 5 years ago

CRAN + BioC: Depends Imports


I've only recently realized how bad Depends: is, thanks to Jan's importing vignette. I just made its discouragement stronger : https://github.com/Rdatatable/data.table/pull/3069/commits/51590bcfec6e9cccc9271800bcf01749a42c7e80

We could disallow Depends. This would also be beneficial to cedta()'s awkward implementation on its last line where it needs to do the tryCatch just for packages which Depend; that line could be removed.

But before we disallow Depends, we'd need to ask 69 CRAN packages to change from Depends to Imports. (Most revdeps already Import.) The longer we leave it, the greater the potential for new packages using Depends to be added to CRAN and the harder it will be to change.

Check when changed to Imports and published on CRAN too :

List as code for easier maintenance of this list: ``` known_depends = c('Ac3net', 'AF', 'batchtools', 'bea.R', 'behavr', 'BuyseTest', 'cdparcoord', 'cffdrs', 'classifierplots', 'clickstream', 'corpustools', 'cvAUC', 'damr', 'dfmeta', 'drgee', 'easycsv', 'edgeRun', 'EurosarcBayes', 'gbp', 'GenomicTools', 'GenomicTools.fileHandler', 'greport', 'haploReconstruct', 'heatwaveR', 'heims', 'ie2misc', 'iemisc', 'JWileymisc', 'koRpus', 'LabourMarketAreas', 'lookupTable', 'LSPFP', 'miLineage', 'mrMLM', 'mrMLM.GUI', 'multicastR', 'musica', 'networkR', 'NNS', 'openSTARS', 'orgR', 'panelaggregation', 'partools', 'penaltyLearning', 'pkggraph', 'QuantTools', 'Rbitcoin', 'reinsureR', 'riskRegression', 'Rnets', 'robCompositions', 'RSauceLabs', 'RWildbook', 'sentometrics', 'simPop', 'simstudy', 'sitree', 'skm', 'slim', 'sparseFLMM', 'stranger', 'surveyplanning', 'tcpl', 'ttwa', 'twl', 'validateRS', 'vardpoor', 'VIM', 'word.alignment') current_depends = devtools::revdep('data.table', 'Depends') new_depends = setdiff(current_depends, known_depends) fixed_depends = setdiff(known_depends, current_depends) current_depends_w_bioc = devtools::revdep('data.table', 'Depends', bioconductor = TRUE) bioc_depends = setdiff(current_depends_w_bioc, current_depends) ``` UNCONTACTED AS OF 2019-08-13 - [ ] CoSMoS - [ ] EBPRS - [ ] GenoScan - [ ] glmaag - [ ] lori - [x] MGDrivE [as of v1.1.0 2019-08-19](https://cran.r-project.org/web/packages/MGDrivE/index.html) - [ ] phenocamapi [GH](https://github.com/bnasr/phenocamapi/) - [ ] PROSPER - [ ] rblt [GH](https://github.com/sg4r/rblt) - [ ] shinyML - [ ] sitreeE - [ ] somspace - [ ] SVN - [ ] synthACS - [ ] TreeLS [GH](https://github.com/tiagodc/TreeLS) - [ ] Tushare - [ ] WGScan Quick links to CRAN index of all packages: ``` cat(sprintf('[%s](https://cran.r-project.org/web/packages/%s/index.html)', current_depends, current_depends), sep = ' ') ``` [Ac3net](https://cran.r-project.org/web/packages/Ac3net/index.html) [AF](https://cran.r-project.org/web/packages/AF/index.html) [batchtools](https://cran.r-project.org/web/packages/batchtools/index.html) [bea.R](https://cran.r-project.org/web/packages/bea.R/index.html) [behavr](https://cran.r-project.org/web/packages/behavr/index.html) [BuyseTest](https://cran.r-project.org/web/packages/BuyseTest/index.html) [cdparcoord](https://cran.r-project.org/web/packages/cdparcoord/index.html) [classifierplots](https://cran.r-project.org/web/packages/classifierplots/index.html) [clickstream](https://cran.r-project.org/web/packages/clickstream/index.html) [corpustools](https://cran.r-project.org/web/packages/corpustools/index.html) [CoSMoS](https://cran.r-project.org/web/packages/CoSMoS/index.html) [cvAUC](https://cran.r-project.org/web/packages/cvAUC/index.html) [damr](https://cran.r-project.org/web/packages/damr/index.html) [dfmeta](https://cran.r-project.org/web/packages/dfmeta/index.html) [drgee](https://cran.r-project.org/web/packages/drgee/index.html) [easycsv](https://cran.r-project.org/web/packages/easycsv/index.html) [EBPRS](https://cran.r-project.org/web/packages/EBPRS/index.html) [edgeRun](https://cran.r-project.org/web/packages/edgeRun/index.html) [EurosarcBayes](https://cran.r-project.org/web/packages/EurosarcBayes/index.html) [gbp](https://cran.r-project.org/web/packages/gbp/index.html) [GenomicTools](https://cran.r-project.org/web/packages/GenomicTools/index.html) [GenomicTools.fileHandler](https://cran.r-project.org/web/packages/GenomicTools.fileHandler/index.html) [GenoScan](https://cran.r-project.org/web/packages/GenoScan/index.html) [glmaag](https://cran.r-project.org/web/packages/glmaag/index.html) [greport](https://cran.r-project.org/web/packages/greport/index.html) [haploReconstruct](https://cran.r-project.org/web/packages/haploReconstruct/index.html) [heims](https://cran.r-project.org/web/packages/heims/index.html) [ie2misc](https://cran.r-project.org/web/packages/ie2misc/index.html) [iemisc](https://cran.r-project.org/web/packages/iemisc/index.html) [JWileymisc](https://cran.r-project.org/web/packages/JWileymisc/index.html) [LabourMarketAreas](https://cran.r-project.org/web/packages/LabourMarketAreas/index.html) [lookupTable](https://cran.r-project.org/web/packages/lookupTable/index.html) [lori](https://cran.r-project.org/web/packages/lori/index.html) [LSPFP](https://cran.r-project.org/web/packages/LSPFP/index.html) [miLineage](https://cran.r-project.org/web/packages/miLineage/index.html) [mrMLM](https://cran.r-project.org/web/packages/mrMLM/index.html) [mrMLM.GUI](https://cran.r-project.org/web/packages/mrMLM.GUI/index.html) [multicastR](https://cran.r-project.org/web/packages/multicastR/index.html) [musica](https://cran.r-project.org/web/packages/musica/index.html) [openSTARS](https://cran.r-project.org/web/packages/openSTARS/index.html) [orgR](https://cran.r-project.org/web/packages/orgR/index.html) [panelaggregation](https://cran.r-project.org/web/packages/panelaggregation/index.html) [partools](https://cran.r-project.org/web/packages/partools/index.html) [phenocamapi](https://cran.r-project.org/web/packages/phenocamapi/index.html) [pkggraph](https://cran.r-project.org/web/packages/pkggraph/index.html) [PROSPER](https://cran.r-project.org/web/packages/PROSPER/index.html) [QuantTools](https://cran.r-project.org/web/packages/QuantTools/index.html) [Rbitcoin](https://cran.r-project.org/web/packages/Rbitcoin/index.html) [rblt](https://cran.r-project.org/web/packages/rblt/index.html) [reinsureR](https://cran.r-project.org/web/packages/reinsureR/index.html) [riskRegression](https://cran.r-project.org/web/packages/riskRegression/index.html) [Rnets](https://cran.r-project.org/web/packages/Rnets/index.html) [robCompositions](https://cran.r-project.org/web/packages/robCompositions/index.html) [RSauceLabs](https://cran.r-project.org/web/packages/RSauceLabs/index.html) [RWildbook](https://cran.r-project.org/web/packages/RWildbook/index.html) [sentometrics](https://cran.r-project.org/web/packages/sentometrics/index.html) [shinyML](https://cran.r-project.org/web/packages/shinyML/index.html) [simPop](https://cran.r-project.org/web/packages/simPop/index.html) [sitreeE](https://cran.r-project.org/web/packages/sitreeE/index.html) [skm](https://cran.r-project.org/web/packages/skm/index.html) [slim](https://cran.r-project.org/web/packages/slim/index.html) [somspace](https://cran.r-project.org/web/packages/somspace/index.html) [sparseFLMM](https://cran.r-project.org/web/packages/sparseFLMM/index.html) [SVN](https://cran.r-project.org/web/packages/SVN/index.html) [synthACS](https://cran.r-project.org/web/packages/synthACS/index.html) [TreeLS](https://cran.r-project.org/web/packages/TreeLS/index.html) [ttwa](https://cran.r-project.org/web/packages/ttwa/index.html) [Tushare](https://cran.r-project.org/web/packages/Tushare/index.html) [twl](https://cran.r-project.org/web/packages/twl/index.html) [validateRS](https://cran.r-project.org/web/packages/validateRS/index.html) [VIM](https://cran.r-project.org/web/packages/VIM/index.html) [WGScan](https://cran.r-project.org/web/packages/WGScan/index.html) [word.alignment](https://cran.r-project.org/web/packages/word.alignment/index.html)
jangorecki commented 5 years ago

It shouldn't be disallowed because there are valid use cases for Depends, for example my package dtq which is a logging mechanism for [.data.table. I recall other package too: dt.nuggets(?). We should definitely recommend to imports instead of depends, but IMO it should not be disallowed as there are valid use cases for Depends.

mattdowle commented 5 years ago

Fair enough. Why exactly is it that those packages need Depends and can't use Imports?

jangorecki commented 5 years ago

They are extensions of data.table, their existence without data.table is pointless. They not just "use" data.table.

mattdowle commented 5 years ago

I still don't follow. Importing is as strong as depending in that imports are required. No package importing data.table can install without it. What would fail if they changed to Imports? There is also Enhances:. Does dtq mask [.data.table so it need to be attached in front of data.table on the search path?

jangorecki commented 5 years ago

It is as strong, except the fact that imported namespace doesn't need to be attached. It doesn't mask, but it expects data.table to be attached. There are definitely very few use cases like this so general rule should be to import. In case of listed rev deps that depends on data.table we could propose to change that to imports. Change might need to update examples and unit tests (thus all dependent scripts too) to require data.table explicitly together with such rev dep package, which is potentially significant breaking change, still easy to fix. Regarding change of depends to imports of Rbitcoin rev dep, I am not maintaining this package anymore. It might eventually be incompatible with later versions of data.table or exchange market APIs, I have no idea. Latest CRAN release is 4+ years old, dev version is 2+ years old. If someone is willing to take maintenance of it then it will be best, otherwise it will probably gets removed from CRAN once it will start breaking to many newly added rules.

mattdowle commented 5 years ago

75 maintainers emailed today (up from the 69 in Sep 2018) :

deps = tools::package_dependencies("data.table", which="Depends", reverse=TRUE)[[1]]
paste(sapply(deps, maintainer),collapse=";")

Dear maintainer,

Hello! Thanks for using data.table. Your package is one of 75 CRAN packages that Depend on data.table. Another 469 Import data.table. Importing is much preferred.

The data.table importing vignette now contains this text :

" Besides the Imports: field, you can also use Depends: data.table but we strongly discourage this approach (and may disallow it in future) because this loads data.table into your user’s workspace; i.e. it enables data.table functionality in your user’s scripts without them requesting that. Imports: is the proper way to use data.table within your package without inflicting data.table on your user. In fact, we hope the Depends: field is eventually deprecated in R since this is true for all packages. "

https://cran.r-project.org/web/packages/data.table/vignettes/datatable-importing.html

Another motivation is that it would be beneficial to cedta()'s awkward implementation on its last line where it needs to do the tryCatch just for packages which Depend. That line could be removed.

Eventually we'll add a message, then a warning, and after a few years possibly only allow Importing and not Depending on data.table. The next time you update your package on CRAN, are you ok to change from Depends to Imports please?

We're tracking this issue here : https://github.com/Rdatatable/data.table/issues/3076

It's very early days on this issue and this is just a first email to let you know our thoughts and the plan. Feedback and comments are very welcome, ideally directly in the GitHub issue.

Best, Matt

mbannert commented 5 years ago

First of all thanks a ton for reaching out to the package maintainers.

I fully agree. I always use data.table for my backends and have imported on my more recent packages. In my case panelaggregation was a try to put a package on CRAN back when people wore pyjamas and lived life slow. Last time a data.table change caused issues I tried to get the package of CRAN but was discouraged to do so. I'd rather put my time into my newer packages than to fix a package which is more or less abandoned. On the other hand it would probably not be that much of deal to fix it though I haven't looked at the package in years.

Bottom line: from my point of view, disallowing would be a bit too much since it would get my old package of CRAN forcefully.

tdhock commented 5 years ago

hi I just updated my penaltyLearning package, https://github.com/tdhock/penaltyLearning/commit/ef8e7fbdb1e240ad9dd7c8b40520cd9231ae0580. now it imports data.table instead of depends. basically I just had to add a bunch of library(data.table) lines in tests and examples.

bozenne commented 5 years ago

Hi thanks for the email and for data.table I have also changed to imports in the Github version of the BuyseTest package. Will be updated on CRAN at some point.

Best brice

MichaelChirico commented 5 years ago

Adding these from Bioconductor:

- [ ] amplican
- [ ] Chicago
- [ ] chimeraviz
- [ ] GenoGAM
- [ ] GGtools
- [ ] GOTHiC
- [ ] metavizr
- [ ] OUTRIDER
- [ ] PGA
- [ ] QUALIFIER
- [ ] QuartPAC
- [ ] rBiopaxParser
- [ ] RCAS
- [ ] rfPred
- [ ] rTANDEM
- [ ] SNPhood
- [ ] TIN

Also added some code for repeating this & some more new packages from CRAN under <details> in OP

brown-jason commented 5 years ago

tcpl package has moved data.table to imports

mattdowle commented 4 years ago

Reminder email sent today to these 74 :

 [1] "Ac3net"                   "AF"                       "batchtools"              
 [4] "bea.R"                    "behavr"                   "BuyseTest"               
 [7] "cdparcoord"               "classifierplots"          "clickstream"             
[10] "corpustools"              "CoSMoS"                   "cvAUC"                   
[13] "damr"                     "dfmeta"                   "drgee"                   
[16] "easycsv"                  "EBPRS"                    "edgeRun"                 
[19] "EurosarcBayes"            "gbp"                      "GenomicTools"            
[22] "GenomicTools.fileHandler" "GenoScan"                 "glmaag"                  
[25] "greport"                  "haploReconstruct"         "heims"                   
[28] "ie2misc"                  "iemisc"                   "JWileymisc"              
[31] "LabourMarketAreas"        "lookupTable"              "lori"                    
[34] "LSPFP"                    "miLineage"                "mrMLM"                   
[37] "mrMLM.GUI"                "multicastR"               "musica"                  
[40] "networkR"                 "openSTARS"                "orgR"                    
[43] "panelaggregation"         "partools"                 "phenocamapi"             
[46] "pkggraph"                 "PROSPER"                  "QuantTools"              
[49] "Rbitcoin"                 "rblt"                     "reinsureR"               
[52] "riskRegression"           "Rnets"                    "robCompositions"         
[55] "RSauceLabs"               "RWildbook"                "sentometrics"            
[58] "shinyML"                  "simPop"                   "sitreeE"                 
[61] "skm"                      "slim"                     "somspace"                
[64] "sparseFLMM"               "SVN"                      "synthACS"                
[67] "TreeLS"                   "ttwa"                     "Tushare"                 
[70] "twl"                      "validateRS"               "VIM"                     
[73] "WGScan"                   "word.alignment"  

Dear maintainer, Hello! Thanks for using data.table. Your package is one of 74 CRAN packages that Depend on data.table. Another 667 Import data.table. Importing is much preferred. The data.table importing vignette now contains this text :" Besides the Imports: field, you can also use Depends: data.table but we strongly discourage this approach (and may disallow it in future) because this loads data.table into your user’s workspace; i.e. it enables data.table functionality in your user’s scripts without them requesting that. Imports: is the proper way to use data.table within your package without inflicting data.table on your user. In fact, we hope the Depends: field is eventually deprecated in R since this is true for all packages. " https://cran.r-project.org/web/packages/data.table/vignettes/datatable-importing.html Another motivation is that it would be beneficial to cedta()'s awkward implementation on its last line where it needs to do the tryCatch just for packages which Depend. That line could be removed.Eventually we'll add a message, then a warning, and after a few years possibly only allow Importing and not Depending on data.table. The next time you update your package on CRAN, are you ok to change from Depends to Imports please? We're tracking this issue here : https://github.com/Rdatatable/data.table/issues/3076

It has been 6 months since I first emailed. Each email and package status is logged in the tracking issue. Feedback and comments are very welcome, ideally directly in the GitHub issue.

Best, Matt

ekstroem commented 4 years ago

Updated networkR package with Depends -> Imports on its way to CRAN

jangorecki commented 4 years ago

Added badges of Depends/Imports count to first post. They are being refreshed daily.

gyang274 commented 4 years ago

Hi, @mattdowle

Thank you for the notice, and the wonderful data.table package.

I experienced inconsistent solution for communicating with Rcpp, when used imports instead of depends. This could caused by other issues. But at that time, depends is a reasonable choice as the package relied on and built around data.table features.

Let me test on imports and get back to you.

Thank you.

sborms commented 4 years ago

Thanks for the reminder! Should be done for the sentometrics package by next CRAN release.

Sam

bozenne commented 4 years ago

Hello, I have (finally) updated the version of the BuyseTest package on CRAN (1.8) where data.table has been moved to the imports section, as you suggested.

Regards brice

JWiley commented 4 years ago

Hello,

This is fixed in JWileymisc now propagating through CRAN. I took the opportunity to change all depends into imports.

Cheers,

Josh

alexWhitworth commented 4 years ago

synthACS now uses Imports instead of Depends. Push sent to CRAN.

mattdowle commented 1 year ago

Currently 79 packages that Depend rather than Import and they continue to be created or updated.

download.file("http://cloud.r-project.org/web/packages/packages.rds", tmp<-tempfile())  # has more fields than PACKAGES.rds
x = readRDS(tmp)
rownames(x) = x[,"Package"]
deps = tools::package_dependencies("data.table", which="Depends", reverse=TRUE)[[1]]
length(deps)   # 90
y = as.matrix(table(substring(x[deps,"Published"],1,7)))
y = y[sort(rownames(y), decreasing=TRUE),,drop=FALSE]
y
# Number of packages Depending (not Importing) data.table by publish month
2024-03    5
2024-02    4
2024-01    3
2023-12    1
2023-11    3
2023-10    4
2023-09    2
2023-08    6
2023-06    1
2023-04    2
...

I never did get to the bottom of Jan's objections above to deprecating (very gently and slowly over several years) Depend'ing; i.e. what the valid use cases of Depend'ing are: why Import'ing won't work or is not as good as Depend'ing.