fstpackage / fst

Lightning Fast Serialization of Data Frames for R
http://www.fstpackage.org/fst/
GNU Affero General Public License v3.0
614 stars 42 forks source link

Does it make sense to link to existing zstd/lz4 libraries when building R4 from source on Windows? #245

Closed aadler closed 2 years ago

aadler commented 4 years ago

The new package manager system for building R on windows has the ZSTD and LZ4 libraries available. Would it make sense to link to those instead of building it from scratch when building from source? Thanks!

MarcusKlik commented 4 years ago

Hi @aadler, thanks for your suggestion!

It's great that ZSTD (nice work :-)) and LZ4 are now part of the rtools package manager and it would definitely be much simpler to have fst get the compressors from pacman.

As a sidenote, similarly, on linux we could also use the distributed packages instead of compiling code, that would definitely make it a lot easier to pass the CRAN prechecks that regularly choke on LZ4 and ZSTD code due to the sanitizer checks.

But users might might use R with a version lower than 4.0 and then it wouldn't work right? (and the CRAN builds on previous windows R releases would also fail I guess?)

aadler commented 4 years ago

Hello @MarcusKlik. You are correct. I know that the great Duncan Murdoch (and I don't use that term lightly) is working on a way to detect R3 vs R 4 at build time so that the rgl package can either use native png/jpeg/freetype in R4. I'm not sure exactly how he did it though. While R-forge is harder to navigate than github, I figure that manually looking at the changes in the rgl repository for changes labeled "Update to work with Rtools40" would likely be fruitful. I'm hoping other developers like @s-u can use a similar method for the png and jpeg packages and the like.

aadler commented 4 years ago

It seems that the specific revision is just an update for 4.0, and that he hasn't committed anything with build-time logic yet

s-u commented 4 years ago

One problem is that packages still have to work with Rtools40 and without. It would be nice to detect the presence of the msys2 libraries and only use the fall-back if they are not present, but I have yet to get an answer about the official way of detecting it. I'll try to look into it this week and can report back.

MarcusKlik commented 4 years ago

thanks @aadler and @s-u, yes, if we can detect msys2, that would be extremely useful!

Another thing is that we must be 100 percent sure that the API of the C++ library doesn't change and breaks the package build. I guess that will be true for ZSTD and LZ4, at least for the basic entry points. Even better would be to set up a system where (R) packages can be tested against dev versions of the msys2 libraries, to avoid unexpected problems on live systems after a msys2 package update...

s-u commented 4 years ago

That is really a question to @jeroen - previously, the CRAN setup tended to be more "frozen" so you wouldn't expect changes. However, the roll-out of the Rtools40 toolchain on CRAN is fairly new so I don't think there was any official definition of the process for upgrading libraries yet, but Jeroen (or Uwe?) is the best person to clarify.

jeroen commented 4 years ago

As Simon said, the roll-out of rtools40 is very fresh. During the first year of a new toolchain we are in a transition period while CRAN is building binaries with different toolchains for R 3.6 and R 4.0. So you shouldn't assume new rtools40 features like C++17 or availability of system libs.

I expect we can really start taking advantage of the rtools40 package manager after the release of R 4.1 in April 2021. At that point CRAN can use rtools40 for both R-release and R-oldrel, so Uwe will probably uninstall the old toolchain and clean up legacy libs on the server.

He also plans to upgrade the hardware by that time, because the current winbuilder still runs Windows 2008 which is a problem for some system libs that require more recent Windows.

But the most important thing, as @s-u already mentions, we need to establish a process for how to manage and upgrade these system libs. We now can provide packages via rtools40, but that doesn't mean the winbuilder sysadmin will constantly install and update all these packages on the server. I'm guessing that while we are in the dual toolchain transition, Uwe is very cautious when installing extra pacman packages on the server.

Once we do reach an consensus how to install/update rtools40 packages on the winbuilder, we need to decide if we want to freeze these libs at the current versions, or use more of a rolling model, where things are constantly moving forward, like CRAN does on Debian / Fedora.

MarcusKlik commented 4 years ago

Hi @jeroen, thanks for clarifying that!

So for now I guess it's best to wait until April 2021 and see how the CRAN toolchain is setup at that time.

thanks @jeroen , also for your incredible work on the rtools40 toolchain and windows releases :-)

aadler commented 4 years ago

@jeroen, is there a central place for conversations about packages updating for rtools40 on Windows, specifically the discussions you imply around how Uwe and CRAN will handle external package updates. Would that be R-sig-windows or eventual,ly R-devel?

aadler commented 2 years ago

Rtools changed again; this is moot.