ben-laufer / DMRichR

A R package and executable for the preprocessing, statistical analysis, and downstream testing and visualization of differentially methylated regions (DMRs) from CpG count matrices (Bismark cytosine reports)
https://www.benlaufer.com/DMRichR/
MIT License
38 stars 22 forks source link

group_by_() > group_by() #40

Closed bazyliszek closed 4 years ago

bazyliszek commented 4 years ago

groupby() is depreciated as of dplyr 0.7.0.
The same for select() > select() and filter() > filter() I do not see last 2 but it crahses by find_dmrs2

ben-laufer commented 4 years ago

Hi @bazyliszek, unfortunately, the fix for these isn't straightforward. When I was developing the functions, I tried just changing those parts and it broke the script. So, more would have to be changed, and I haven't yet had a chance to get to it.

Are you getting an error when running the program as is? These should only be warnings and shouldn't affect anything, which is why I left them that way. If you are getting an error can you please share it? It may be from something else.

bazyliszek commented 4 years ago

Hi @ben-laufer , yeah. All runs well until the cellCounts ... it complained about the version of dplyr

Selecting model...Done The model is cellCount ~ Condition + BDg Loading required package: methylCC Loading required package: FlowSorted.Blood.450k Finding hg19 cell type specific DMRs using FlowSorted.Blood.EPIC snapshotDate(): 2019-10-22 see ?FlowSorted.Blood.EPIC and browseVignettes('FlowSorted.Blood.EPIC') for documentation loading from cache [preprocessQuantile] Mapping to genome. Loading required package: IlluminaHumanMethylationEPICmanifest [preprocessQuantile] Fixing outliers. [preprocessQuantile] Quantile normalizing. Error in normalize.quantiles(mat[Index2, ]) : ERROR; return code from pthread_create() is 22 Calls: finddmrs2 ... .qnormStratified -> .qnormStratifiedHelper -> normalize.quantiles In addition: Warning messages: 1: `select()is deprecated as of dplyr 0.7.0. Please useselect()instead. This warning is displayed once every 8 hours. Calllifecycle::lastwarnings()to see where this warning was generated. 2:filter()is deprecated as of dplyr 0.7.0. Please usefilter()instead. See vignette('programming') for more help This warning is displayed once every 8 hours. Calllifecycle::last_warnings()to see where this warning was generated. 3:groupby()is deprecated as of dplyr 0.7.0. Please usegroup_by()instead. See vignette('programming') for more help This warning is displayed once every 8 hours. Calllifecycle::last_warnings()to see where this warning was generated. 4: All elements of...must be named. Did you wantdata = c(Sample, Condition, BDg, Age, Smoking, col, cellCount)? 5: All elements of...must be named. Did you wantdata = c(Sample, Condition, BDg, Age, Smoking, col, cellCount)? 6: Unknown columns:BDg` Execution halted

ben-laufer commented 4 years ago

Thanks for sharing, it looks like the deprecated tidyverse functions are only warnings, and the error is from the Minfi package:

Error in normalize.quantiles(mat[Index2, ]) :
ERROR; return code from pthread_create() is 22

I did a quick google search and it says this has to do with the blas that your R is using. Can you install or load a different version of R with a different blas (or a downgraded version of open blas)?

Here is some further reading: https://support.bioconductor.org/p/122925/

bazyliszek commented 4 years ago

Thanks for info! I see. This is not easy in conda environment I have here. I start to wonder, if one way to go would be to make a singularity image (or docker but singularity is more safe) of whole environment, including all libraries that one refers too (including connecting to databased) and download of such image. With human data we need to move to internet off zone, so I foresee more problems.

ben-laufer commented 4 years ago

Conda does make things trickier, and I generally avoid it for anything related to R because of that. However, I did have a look and it does seem possible to change the BLAS. Interestingly, it doesn't look like you're using the default one, but rather another that I've come across because it really speeds up WGCNA. I tried it out on our cluster and it still works, so I think it has to do with your version. You can read a bit more into it here (although that isn't the default one either, but should should be a good starting point): https://stackoverflow.com/questions/58834940/conda-install-r-essentials-with-mkl

In terms of creating an isolated environment, you can also look into .libPaths(), which is what we use. I still like the idea of using the latest packages, since sometimes there are bug fixes, and new features.

Finally, there's going to be 2 big updates to DMRichR soon. One to make the install quicker and easier, which will also get rid of a lot of warnings, and another with some useful new features. So, I'm going to close this issue now, since the error you're running into is triggered by the minfi package not being happy with the BLAS your R was compiled against.