duckdb / duckdb-r

The duckdb R package
https://r.duckdb.org/
Other
117 stars 25 forks source link

Parallel compilation? #205

Open gaborcsardi opened 1 month ago

gaborcsardi commented 1 month ago

AFAIR it is allowed to use two processor cores on CRAN when installing a package. Would it be possible to leverage this and compile duckdb in parallel? That would cut the current ~24 minutes compilation time in half.

Additionally, if NOT_CRAN (or some other env var) is set, then we could use even more processors, and would (ideally) cut the compilation time to ~6 minutes on 4 processors, or even ~3 minutes on 8 processors.

melhindi commented 1 month ago

It would be even faster if pre-compiled binaries could be used: https://github.com/googlecolab/colabtools/issues/4256 Related issue are #22,#201 I also think that the installation/compilation time is too long from a usability perspective (of an R user)

gaborcsardi commented 1 month ago

It is technically much more challenging to use pre-compiled binaries, and they would have to be restricted to a small subset of platforms, maybe x86_64 Linux only.

OTOH parallel compilation is (probably?) easy to switch on and every platform benefits greatly from it. Btw. I just compiled duckdb-r on s390x Linux in qemu, and it took more than 24 hours (!).

krlmlr commented 4 weeks ago

Thanks. Shouldn't this be a setting in ~/.R/Makevars ? I compile the R package on 8 cores with no problems, it's supported, but why would we hardcode this in the package?

gaborcsardi commented 3 weeks ago

I compile the R package on 8 cores with no problems, it's supported

How?

but why would we hardcode this in the package?

Because people don't know about this. If the user has a setting that use that, sure. If there is no setting, then I would use two processors on CRAN, and maybe two, maybe more elsewhere.

krlmlr commented 3 weeks ago

I have MAKEFLAGS = -j8 in ~/.R/Makevars . Is there prior art of hardcoding the cores, perhaps in Arrow? Would PPM like that setting?

gaborcsardi commented 3 weeks ago

Arrow does build in parallel by default for me, but IDK if it uses MAKEFLAGS.

krlmlr commented 3 weeks ago

@thisisnic: Does this ring a bell? How does the arrow R package achieve multicore compilation by default, regardless of user settings?

thisisnic commented 3 weeks ago

I think this is where we make that happen, dependent on the NOT_CRAN env var: https://github.com/apache/arrow/blob/5ef7e01053c526389acefddd6f961bf1fd9d274b/r/tools/nixlibs.R#L513

krlmlr commented 3 weeks ago

But the nixlibs.R script would only be called by ./configure, no? Or do the environment variables set there affect what happens when building the package?

thisisnic commented 3 weeks ago

Hmm, I'm unsure of the exact details, but @jonkeane will know

jonkeane commented 3 weeks ago

do the environment variables set there affect what happens when building the package?

Yeah, this is what happens