Closed carlopi closed 1 month ago
shellfs
has been published. 2 to go.
duckpgq
needs minor changes to the code to be compatible, see comment here: https://github.com/duckdb/community-extensions/pull/126#issuecomment-2387828556
bigquery
is the most complex one since we are running in some problems with disk space within the container that builds the extension. I am not sure what can be done on our side, I think docker
should alreay be using all available memory but not sure. As a short term solution it would be viable to skip building for linux_arm_gcc4
, and in parallel iterate on this extension side?
Hey @carlopi
I took some time today to look into your workflow and the issue with building the BigQuery extension. I noticed that the disk space on the worker nodes seems to be the main bottleneck. It usually hovers around 21GB per machine, which fills up pretty quickly for the build.
From my perspective, a lot of unnecessary stuff comes pre-installed on these worker nodes. Since you're now shifting to a Docker-based build, I believe there's some potential for optimization on your side. Here are some suggestions:
Optimizing the Dockerfile: I noticed the Dockerfile you're using ca be slightly optimized. I spent a bit of time working on it and managed to cut down the size of the gcc4 image significantly. For the other images, I was able to reduce image sizes by 25-30%. I'd be happy to share my version if you're interested.
Clearing some pre-installed tools: There's quite a bit of bloatware on the worker nodes (with over 50GB already used out of the 73GB available). For example, I found some pre-installed docker images taking up quite some space and various tools (like node, go, etc.) in the /opt/hostedtoolcache
folder that sums up to around 12-13GB. I think in your docker build you don't need these. I just added a step in my workflow that deletes these unnecessary docker images and files at the start of the build process (see here).
pre-installed docker images
before the delete
after the delete
Let me know what you think!
DuckPGQ has now also been updated https://github.com/duckdb/community-extensions/pull/135
And https://github.com/duckdb/community-extensions/pull/136 should solve bigquery
, and we should be good to close this!
Tuesday we released
v1.1.1
, and most extensions have been built / tested and deployed already.I am aware of 3 cases that are not working out of the box while changing the DuckDB version:
python
being available when tests are run (see https://github.com/duckdb/community-extensions/actions/runs/10973205947/job/30470361697). This requires minor modification on our side.Those looks to be all on our side, I will update as they are fixed.