Open shahrozeabbas opened 8 months ago
Good question -- yes, it is planned to release BPCells through CRAN but it will probably be a little while (I'd estimate 2-3 months). There are a couple technical changes to the compiled code in order to meet CRAN's portability requirements, and first-time CRAN submissions also have pretty stringent documentation requirements such as examples for every public function.
I don't have a plan for conda release right now, though could be convinced to do so by someone with experience releasing R packages in conda.
I'll also mention that BPCells has some pre-built packages for Windows, Mac, and Ubuntu Jammy available through R-universe. I don't actively check that these builds are working, but they should automatically track the github main branch and help skip C++ compilation time during install.
Hi, @bnprks and @shahrozeabbas,
I agree that it would be expected that BPCells
can be submitted to CRAN.
I'm currently trying to build an R package to provide some sub-classes of SingleCellExperiment
where assays
would depend on BPCells matrices to store single-cell data. In my experiences, IterableMatrix
backed on disk files performs better than DelayedArray
on IO and most matrix mathematics. I can see that some day more excellent features would work for BPCells, such as holding project metadata along with it, and interoperation between SingleCellExperiment
, SeuratObject
, AnnData
and so on. Therefore, it would be definitely apprecitated that BPCells can become an generally accessible dependency package for R community.
Just circling back to this. As much as having BPCells
on CRAN would be nice, I think adding it to Bioconductor would allow for it to be picked up by Bioconda automatically. I'm not 100% sure about this, but just a idea.
Adding to both CRAN and Bioconductor could be nice. Although there is overlap in the process, that's obviously more of an ask.
very frustrating to install this package and spend for several days and still did not work on M1 Mac.
Hi @shahrozeabbas, I hadn't heard of a specific automated connection between bioconda and Bioconductor but it's an interesting thing to consider. Unfortunately the rules of Bioconductor disallow submitting a package that exists on CRAN (and CRAN at least disallows sharing a package name with a Bioconductor package), so I think it has to be one or the other.
There are definitely unique advantages to both CRAN and Bioconductor, though I'm currently leaning towards CRAN as it is the default source when using install.packages
and allows a more flexible update schedule. Bioconductor has its merits too, such as more coordinated systems for testing cross-compatibility during version changes, but from where I am right now I think ease of installation might win out.
@bnprks Yeah I think you're right, don't believe there is any automated connection for it. I agree though, CRAN may be more useful.
If I am able to submit something to Anaconda in R, I will be sure to reach out.
Good question -- yes, it is planned to release BPCells through CRAN but it will probably be a little while (I'd estimate 2-3 months). There are a couple technical changes to the compiled code in order to meet CRAN's portability requirements, and first-time CRAN submissions also have pretty stringent documentation requirements such as examples for every public function.
I don't have a plan for conda release right now, though could be convinced to do so by someone with experience releasing R packages in conda.
I'll also mention that BPCells has some pre-built packages for Windows, Mac, and Ubuntu Jammy available through R-universe. I don't actively check that these builds are working, but they should automatically track the github main branch and help skip C++ compilation time during install.
I was able to get a working conda binary (for Linux only) using the following set of files and running conda build
and uploaded it to Anaconda.
Hi @rschauner, thanks for setting this up! Hopefully this will make installation faster for conda users.
Just one change I'd suggest to improve portability of the build: could you set the environment variable BPCELLS_DISABLE_MARCH_NATIVE
prior to the R installation in your build scripts?
With that change, I guess just two remaining questions:
Thanks again for figuring out this conda setup!
The way the build is set up, I need to pull in a version from GitHub, so if you can patch that into a v0.1.1, I can build without the flag set. The build has to be run manually, but could probably be done via a GitHub action. I haven't figured out a way to build it on my M1 Mac or use conda build to cross compile it (maybe without the flag it would work).
It's already public so if you would like to suggest it in the README, I'm perfectly fine with that.
I have created a package using BPCells
backend for DelayedArray objects deposited in https://github.com/Yunuuuu/BPCellsArray, now, we can combine BPCells
with Bioconductor worflow
Hi @shahrozeabbas, I hadn't heard of a specific automated connection between bioconda and Bioconductor but it's an interesting thing to consider. Unfortunately the rules of Bioconductor disallow submitting a package that exists on CRAN (and CRAN at least disallows sharing a package name with a Bioconductor package), so I think it has to be one or the other.
There are definitely unique advantages to both CRAN and Bioconductor, though I'm currently leaning towards CRAN as it is the default source when using
install.packages
and allows a more flexible update schedule. Bioconductor has its merits too, such as more coordinated systems for testing cross-compatibility during version changes, but from where I am right now I think ease of installation might win out.
Hi, @bnprks . I agree that CRAN is better than Bioconductor for BPCells according to the flexible update schedule.
Good question -- yes, it is planned to release BPCells through CRAN but it will probably be a little while (I'd estimate 2-3 months). There are a couple technical changes to the compiled code in order to meet CRAN's portability requirements, and first-time CRAN submissions also have pretty stringent documentation requirements such as examples for every public function.
CRAN does have requirements for not only the stringent documents but also coding style and API specification (exported and unexported functions). In my opinion, it might be time for BPCells to reach the nightly stage of releasing to CRAN, since the core features such as data preprocessing and matrix manipulations have become more and more stable. Would you mind if I spend some effort on tidying up the R codes and documents of BPCells to fit the CRAN check?
Hi @ycli1995, thanks for the offer to help. I think the best places to get started would be collecting a clear list of CRAN requirements that are not yet met, and possibly making any small (<5-line) changes that would solve one-off issues. This could include disabling long-running vignettes or tests when building on CRAN, for example.
For any requirements that require larger changes throughout the code, I think we should at least agree on the changes to be made first, and in some cases I might prefer to do things myself or heavily edit something you've written. (The example usage section for function documentation will probably be a case of this)
I am working on some technical changes on the C++ side that I think need to come through prior to CRAN release too (so that we can compile without the -march=native
flag and not sacrifice performance).
@bnprks sorry, are there any updates regarding BPCells submission to CRAN? I think this would be super useful because 1) compilations takes some time, 2) many of the github issues are related to compilation, 3) BPCells is updated quite regularly.
I am still amazed by the speed of BPCells and would think that lots of users would appreciate a CRAN submission (which also means that it's precompiled, also for Linux via Posit Public Package Manager).
Thanks
Hi @mihem, there's not much concrete progress to report, though CRAN submission is still very much on the roadmap. The fix for -march=native
is now up in the branch highway-simd
, so I think adding examples for every public function and worrying about CRAN's 5-10MB package size limits are probably the remaining two big challenges. That said, there are a couple options that can help address your issues 1-3 in the mean time.
For Mac/Windows users, I have an R universe project set up, which provides pre-compiled builds for Mac/Windows that I believe should come with HDF5 statically linked to avoid the most common installation challenge. Those builds automatically track the github main branch and can be installed like this for example: install.packages("BPCells", repos = c("https://bnprks.r-universe.dev", "https://cran.r-project.org"))
For Linux users, it is also possible to speed up compilation by editing your ~/.R/Makevars
file. The most impactful change is probably adding the line MAKEFLAGS=--jobs=8
, where you can adjust the 8
to match how many cores you want to use in parallel for compilation. Some other changes that require additional software tools would be enabling ccache
to speed up recompilation, e.g. with the line CXX=ccache g++
, or using the faster mold
linker by adding the line LDFLAGS=-fuse-ld=mold
.
As mentioned above, CRAN has a manual review process with many requirements for submission that make it much more complicated than, say, uploading a python package to PyPI. But it is still very much on the roadmap, just I've had other more urgent areas to work on personally.
Thanks for the update and the tipps.
I'm on Linux and sure speeding up compilation is great but installing binary is still x fold faster i guess and of course no compilation errors.
I completely understand, I only wanted to highlight that CRAN submission would be appreciated (not only by me personally also by packages that depend on BPCells such as Seurat I think.)
Thanks 🙏
Hi,
Was just curious if there was a plan to upload to CRAN with the release of Seuratv5? I know these are independent packages, but since Seurat v5 depends on BPCells, would be nice to install via CRAN or CONDA.
Thanks 🙂