anothersmith / node-duckdb

DuckDB NodeJS bindings
MIT License
48 stars 12 forks source link

Provide pre-built binaries for alpine/musl #136

Open rgoupil opened 2 years ago

rgoupil commented 2 years ago

It'd be great if node-duckdb would provide pre-built binaries for Alpine builds, which relies on musl rather than glibc.

Alpine is a very common choice of docker base image for CD/CI and production systems. As no pre-built binaries of node-duckdb can be found for musl, it currently forces all these users to rebuild the package from scratch, which in our case adds an extra 15 to 25 minutes to the GitHub pipeline. In order to fight this, we are then forced to implement caching mechanism to those pipelines or to swap to a debian docker base image, which further increase the cost and complexity of integrating node-duckdb to a project.

Building binaries for musl would solve the issue for all users relying on alpine variants or other musl-based flavors without forcing them to invest more time and effort into adapting their existing system to fit node-duckdb.

I am willing to take care of the PR for that. From my understanding it involves adding an alpine build image in the docker-compose.yml file, adding a prebuild:alpine rule to the package.json, modifying the github pipeline before updating the dev doc to reflect the changes for manual build. If that's correct, I would like some hint for the modification to the Github pipelines as I don't see where the different OS build are spawned.

Many thanks and I wish you the best for 2022!

rprovodenko commented 2 years ago

Hey thank you! You also have an amazing 2022!

This is an interesting question actually, but it seems it's more of a question to the makers of https://www.npmjs.com/package/prebuild which is what we use to generate the builds. You see the resulting binary packages are called e.g. node-duckdb-v0.0.79-napi-v6-linux-x64.tar.gz, so unless there is a way to specify some prefix/postfix in the name in prebuild one of the linux builds will override the other. Would you be able to take a look at prebuild and check I've not missed anything? I would also think that it's much more difficult to determine the distribution of Linux rather than whether it's Linux/Windows/etc. So the actual distribution would need to be specified by the client explicitly.

Maybe there is a way to link to the two libraries dynamically in such a way that one is loaded if the other is not present - this is a question a C/C++ programmer would be able to answer.

Yeah, anyway, the first step would be consulting with prebuild. Once prebuild is able to generate binaries for multiple linux flavours, it would be trivial to add this to node-duckdb (would involve adding an appveoyr job which would look exactly as the one for mac except with a different image specified) and you are welcome to do so.

Alternatively I guess we could create separate releases for separate platforms, but that kind of complicates things and also would require the build/release system to be changed. Yeah, interesting question definitely.

Maybe the easier solution would be to install glibc?

rprovodenko commented 2 years ago

Update: The prebuild guys have answered: https://github.com/prebuild/prebuild/issues/290#issuecomment-1043258660 So it's just a matter of adding an appveyor job. You're welcome to create a PR for it and I'll test it out

rgoupil commented 2 years ago

Hey @rprovodenko, thanks for the preliminary work! I'll have a look at this next week 😃

rgoupil commented 1 year ago

Oh hey, more than year since I made that issue. We found way to work around it and as with everything that doesn't hurt enough in software engineering it kind of fell out of priority. @rprovodenko do you want to keep that issue open or should I close it?

csgui commented 1 year ago

We found way to work around it

Hi @rgoupil ! I would love to hear more about your workaround to that issue. I am facing the same problem since I have an image that is based on alpine/musl.

Thanks!

rgoupil commented 1 year ago

We cache the node_modules using the yarn.lock hash in ECR. The Dockerfile uses multiple stages, where one stage run yarn/npm install before another stage actually build the project. This allow to use Docker multiple stage caching to our advantage and reuse the previous yarn/npm install stage as long as no modifications have been applied to the package.json (https://docs.docker.com/build/cache/#use-multi-stage-builds). However any modifications to the package.json will add 35+ minutes to our pipeline, a large amount of which is dedicated to building DuckDB on Alpine.

While imperfect, investing more time into this issue is not worth it as we rarely notice the issue anymore. I hope this will be of assistance to you!

csgui commented 1 year ago

Thanks for the info @rgoupil!