extendr / rextendr

An R package that helps scaffolding extendr-enabled packages or compiling Rust code dynamically
https://extendr.github.io/rextendr/
Other
181 stars 27 forks source link

Add LTO option and opt-level to reduce the package size #227

Open alexhroom opened 1 year ago

alexhroom commented 1 year ago

Currently when running R CMD CHECK on a repository using extendr, it flags up a NOTE on package size:

 * checking installed package size ... NOTE
  installed size is  6.6Mb
  sub-directories of 1Mb or more:
    libs   6.2Mb

This libs directory refers to the directory where the compiled Rust code is stored as a .dll or .so file. Rust by default generates huge executables, and the size can be reduced to no longer create the issue by adding some compiler options (here added to Cargo.toml, as in PR #226 ):

[profile.release]
opt-level = 'z' 
lto = true

While this isn't generally an issue for extendr use, it is an issue for users who prefer/require their packages to be available on CRAN - R CMD CHECK is expected to return no errors, warnings or notes whatsoever. There are a couple choices I'd consider:

yutannihilation commented 1 year ago

Sorry, I really don't get your point. As I showed on Discord, this NOTE is definitely not a problem. Why do you need to reduce the size? What do you think is the problem?

Ilia-Kosenkov commented 1 year ago

If I understand correctly, the problem is -- Rust by default builds a library optimized for development and execution time by including a ton of metadata so that a simple Hello World is 3MB worth of binary. That's insane for an AOT language without huge runtime.

This in turn triggers CRAN check, which says that hey you have humongous binary in your lib folder, which is strange.

The proposed solution is to set standard optimization flags so we produce smaller binaries without extra symbol metadata. This will be triggered when the package is built by the consumer.

alexhroom commented 1 year ago

@yutannihilation a mix of things:

yutannihilation commented 1 year ago

Again, the NOTE is not necessarily a problem. The automatic tests won't fail. It's just a NOTE. It's true it's possible that CRAN maintainers complain about the size, but I think we should know how large is too large first, because otherwise we wouldn't know whether the reduction really solve the problem or not. Actually, the stringi package shows the same NOTE with 32MB, so I think 36MB is not terribly large.

https://cran.r-project.org/web/checks/check_results_stringi.html

That said, I agree 36MB feels a bit large, so I'm not against the idea of enabling LTO itself.

yutannihilation commented 1 year ago

You might need to use UseLTO field if you enable LTO, but I'm not sure (I've never seen this field is used).

https://cran.r-project.org/doc/manuals/r-release/R-exts.html#The-DESCRIPTION-file https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Using-Link_002dtime-Optimization

Ilia-Kosenkov commented 1 year ago

Just to clarify, we are arguing about the default config here. Anyone can scaffold a package using {rextendr} and then just modify Cargo to fit their needs. We can also just mention it in the docs. Or, alternatively, we can indeed optimize by default but then write in docs that hey, you can disable this optimization if you want. I personally prefer shipping config for optimized version

alexhroom commented 1 year ago

yeah sensibly it is 'just a NOTE' but in my experience when uploading a new package for the first time, the maintainers will either require that the tests pass as OK completely, or you have to have a big email conversation with the maintainers where you give a good reason why they're not OK - as I said, a pointless sink on both mine and their time.

as for the UseLTO field, I also have never seen it. I'll be uploading a new version of my package rshift to CRAN when submissions reopen so I'll update on if they say anything about it.

Ilia-Kosenkov commented 1 year ago

@alexhroom, how about we wait for your next submission (with optimized configuration), and if it creates no issues (and removes that NOTE), then we implement it as the default option for packages.

yutannihilation commented 1 year ago

Yeah, in general, they complain all sorts of things, but I didn't get such an email when I first submit my package (string2path) with the package size NOTE. Anyway, I don't have strong reasons to oppose. If you want, I can try submitting my package with the LTO setting.

alexhroom commented 1 year ago

Package was submitted successfully, no NOTE or complaints from the CRAN folks.

Ilia-Kosenkov commented 1 year ago

I propose to re-open https://github.com/extendr/rextendr/pull/226. If there will be no strong objections to this, I suggest we add this feature. While it is important because it will be the default setting going forward (and people rarely modify default configs), this option can still be easily disabled/changed if our consumer decides it is harming their package.

CGMossa commented 1 year ago

I feel like Rust compile times & development time is going to be a problem, and it is going to be the first problem people encounter starting out with rust / extendr / rextendr. Submitting to CRAN is such an advanced problem, in the sense that it comes much later, that I feel a vignette describing the building / packaging / deployment of a rust-powered r-package is perhaps a better solution, than burdening newcomers with a heavy default setting.

But lately, my opinions are routinely wrong, so take this with a grain of salt.

Besides there are already warnings for CRAN checks, that we can copy & insert into our vignette, to ensure that people that face that NOTE, may land on the right article on rextendr package pkgsite.

alexhroom commented 1 year ago

if compile times are made significantly longer i'm happy to instead submit a PR to r-rust/faq detailing compiler settings for optimisiation - this would especially be useful for users who are a lot more proficient in R than they are in Rust

Ilia-Kosenkov commented 1 year ago

Well, we can perhaps benchmark package build times? This should not be that difficult.

CGMossa commented 1 year ago

I'm very much opposed to have default configuration to "optimised". There is nothing optimised about slowing down the day-to-day compiling, but a build that will happen less frequently, then when authoring code. I do not believe enabling these flags will have zero impact on compile time. Maybe not for these uses we have here where type-augmentation-thing happens (I fail to recall what it is called).

JosiahParry commented 2 months ago

The following should be added as an optional arguments when creating an extendr package:

[profile.release]
lto = true
codegen-units= 1

These settings have helped me decrease the size of my binaries significantly enough that there are no notes from CRAN. Additionally, a NOTE is just a NOTE. We can make a function similar to usethis::use_cran_comments() that augments it to use standard language to indicate that a large file size is due to vendored deps. This has been confirmed by CRAN to not be a problem so long as it is noted in the cran-comments.md