Open gaborcsardi opened 1 month ago
It would be even faster if pre-compiled binaries could be used: https://github.com/googlecolab/colabtools/issues/4256 Related issue are #22,#201 I also think that the installation/compilation time is too long from a usability perspective (of an R user)
It is technically much more challenging to use pre-compiled binaries, and they would have to be restricted to a small subset of platforms, maybe x86_64 Linux only.
OTOH parallel compilation is (probably?) easy to switch on and every platform benefits greatly from it. Btw. I just compiled duckdb-r on s390x Linux in qemu, and it took more than 24 hours (!).
Thanks. Shouldn't this be a setting in ~/.R/Makevars
? I compile the R package on 8 cores with no problems, it's supported, but why would we hardcode this in the package?
I compile the R package on 8 cores with no problems, it's supported
How?
but why would we hardcode this in the package?
Because people don't know about this. If the user has a setting that use that, sure. If there is no setting, then I would use two processors on CRAN, and maybe two, maybe more elsewhere.
I have MAKEFLAGS = -j8
in ~/.R/Makevars
. Is there prior art of hardcoding the cores, perhaps in Arrow? Would PPM like that setting?
Arrow does build in parallel by default for me, but IDK if it uses MAKEFLAGS
.
@thisisnic: Does this ring a bell? How does the arrow R package achieve multicore compilation by default, regardless of user settings?
I think this is where we make that happen, dependent on the NOT_CRAN
env var: https://github.com/apache/arrow/blob/5ef7e01053c526389acefddd6f961bf1fd9d274b/r/tools/nixlibs.R#L513
But the nixlibs.R
script would only be called by ./configure
, no? Or do the environment variables set there affect what happens when building the package?
Hmm, I'm unsure of the exact details, but @jonkeane will know
do the environment variables set there affect what happens when building the package?
Yeah, this is what happens
AFAIR it is allowed to use two processor cores on CRAN when installing a package. Would it be possible to leverage this and compile duckdb in parallel? That would cut the current ~24 minutes compilation time in half.
Additionally, if
NOT_CRAN
(or some other env var) is set, then we could use even more processors, and would (ideally) cut the compilation time to ~6 minutes on 4 processors, or even ~3 minutes on 8 processors.