duckdb / community-extensions

https://duckdb.org/community_extensions
143 stars 26 forks source link

Fix missing extension on v1.1.1 #127

Closed carlopi closed 1 month ago

carlopi commented 1 month ago

Tuesday we released v1.1.1, and most extensions have been built / tested and deployed already.

I am aware of 3 cases that are not working out of the box while changing the DuckDB version:

Those looks to be all on our side, I will update as they are fixed.

carlopi commented 1 month ago

shellfs has been published. 2 to go.

duckpgq needs minor changes to the code to be compatible, see comment here: https://github.com/duckdb/community-extensions/pull/126#issuecomment-2387828556

bigquery is the most complex one since we are running in some problems with disk space within the container that builds the extension. I am not sure what can be done on our side, I think docker should alreay be using all available memory but not sure. As a short term solution it would be viable to skip building for linux_arm_gcc4, and in parallel iterate on this extension side?

hafenkran commented 1 month ago

Hey @carlopi

I took some time today to look into your workflow and the issue with building the BigQuery extension. I noticed that the disk space on the worker nodes seems to be the main bottleneck. It usually hovers around 21GB per machine, which fills up pretty quickly for the build.

From my perspective, a lot of unnecessary stuff comes pre-installed on these worker nodes. Since you're now shifting to a Docker-based build, I believe there's some potential for optimization on your side. Here are some suggestions:

  1. Optimizing the Dockerfile: I noticed the Dockerfile you're using ca be slightly optimized. I spent a bit of time working on it and managed to cut down the size of the gcc4 image significantly. For the other images, I was able to reduce image sizes by 25-30%. I'd be happy to share my version if you're interested. mega-optimized

  2. Clearing some pre-installed tools: There's quite a bit of bloatware on the worker nodes (with over 50GB already used out of the 73GB available). For example, I found some pre-installed docker images taking up quite some space and various tools (like node, go, etc.) in the /opt/hostedtoolcache folder that sums up to around 12-13GB. I think in your docker build you don't need these. I just added a step in my workflow that deletes these unnecessary docker images and files at the start of the build process (see here).

pre-installed docker images pre-installed-images

before the delete cleanup-before

after the delete cleanup-after

  1. Using existing Github Actions for space management: Alternatively to option 2, there are also a few GH Actions that allow you to configure and reduce the disk usage even further (e.g., here). While the option above is straightforward, you could explore these actions for more advanced disk management if needed.

Let me know what you think!

Dtenwolde commented 1 month ago

DuckPGQ has now also been updated https://github.com/duckdb/community-extensions/pull/135

carlopi commented 1 month ago

And https://github.com/duckdb/community-extensions/pull/136 should solve bigquery, and we should be good to close this!