Closed Mmoncadaisla closed 7 months ago
Ok, the issue seems to be with writing to the /var/lib/postgresql/
directory, DuckDB attempts to create a hidden directory named .duckdb
within it, where it attempts to create the managed_secrets
and extensions
subfolders, without success when it runs within CNPG
Resolved by setting the following options (note that these are not directly allowed at the CREATE SERVER <> FOREIGN DATA WRAPPER <> duckdb_fdw OPTIONS
syntax) thanks to the hint from the CNPG community.
SELECT duckdb_execute('duckdb_svr', 'SET extension_directory = ''/var/lib/postgresql/data/pgdata/.duckdb/extensions'';');
SELECT duckdb_execute('duckdb_svr', 'SET secret_directory = ''/var/lib/postgresql/data/pgdata/.duckdb/stored_secrets'';');
Context
Hi! we're attempting to run DuckDB via
duckdb_fdw
extension inside a PostgreSQL database that is running in our Kubernetes cluster via CNPG operator.Our ultimate goal is to be able to leverage duckdb to manage data that is currently stored in (geo)parquet files in our Google Cloud Storage buckets.
The database pods already have the
GOOGLE_APPLICATION_CREDENTIALS
set and we've verified that accessing data in a GCS bucket is possible (we did this viaraster2pgpsql
, using out-db COG rasters that are stored in GCS)We strongly prefer to have all the duckdb storage remotely (either having the database itself stored in GCS or working in-memory only and persisting data to .parquet files)
Problem description
We're able to load the extension and create a foreign server without issues, see below:
A) Creating the server using the in-memory option:
B) Creating the server using local storage
I believe that the 'Failed to create directory' error could be related to permissions to create the directory, perhaps related to CNPG internals.
C) Creating the server using remote storage
Questions
What is the recommended approach for leveraging a remote storage such as Google Cloud Storage as in this case, to store the database information instead of using a local postgres route?
Is it possible to load remote extensions using the
':memory:'
option?Additional information
Versions
1.22.2
14.11
0.10.1
Dockerfile
Expand Dockerfile
```dockerfile # Specify the exact base image version for consistency FROM postgis/postgis:14-3.4 # Set environment variables ENV POSTGRES_VERSION 14 ENV POSTGIS_VERSION 3 # Do not split the description, otherwise we will see a blank space in the labels LABEL name="PostgreSQL + PostGIS Container Images" \ vendor="The CloudNativePG Contributors" \ version="${PG_VERSION}" \ release="42" \ summary="PostgreSQL + PostGIS Container images." \ description="This Docker image contains PostgreSQL, PostGIS and Barman Cloud based on Postgres 14-3.4." # Copy requirements files COPY requirements.txt / COPY requirements-udf.txt / # Install packages excluding cmake RUN apt-get update && \ apt-get install -y --no-install-recommends \ python3-pip python3-psycopg2 python3-setuptools wget make gcc \ libtool unzip pgxnclient libpq-dev postgresql-server-dev-${POSTGRES_VERSION} \ lsb-release \ ca-certificates build-essential autoconf automake zlib1g-dev \ libxml2-dev postgresql-server-dev-all git \ libboost-atomic1.74.0 libboost-chrono1.74.0 libboost-graph1.74.0 \ libboost-date-time1.74.0 libboost-program-options1.74.0 libboost-system1.74.0 \ libboost-thread1.74.0 libboost-dev && \ apt-get install -y python3 postgresql-plpython3-${POSTGRES_VERSION} && \ pip3 install --upgrade pip && \ pip3 install -r requirements-udf.txt && \ pip3 install --no-deps -r requirements.txt # Install additional extensions RUN set -xe; \ apt-get update; \ apt-get install -y --no-install-recommends \ "postgresql-${POSTGRES_VERSION}-pgaudit" \ "postgresql-${POSTGRES_VERSION}-pg-failover-slots" \ "postgresql-${POSTGRES_VERSION}-pgrouting" \ ; \ rm -fr /tmp/* ; \ rm -rf /var/lib/apt/lists/*; # Install barman-cloud RUN set -xe; \ apt-get update; \ apt-get install -y --no-install-recommends \ python3-pip \ python3-psycopg2 \ python3-setuptools \ ; \ pip3 install --upgrade pip; \ # TODO: Remove --no-deps once https://github.com/pypa/pip/issues/9644 is solved pip3 install --no-deps -r requirements.txt; \ rm -rf /var/lib/apt/lists/*; # Install TimescaleDB RUN echo "deb https://packagecloud.io/timescale/timescaledb/debian/ $(lsb_release -c -s) main" | tee /etc/apt/sources.list.d/timescaledb.list && \ wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | apt-key add - && \ apt-get update && \ apt-get install -y "timescaledb-2-postgresql-${POSTGRES_VERSION}" # Manually install a compatible version of cmake RUN apt-get remove -y cmake && \ wget https://github.com/Kitware/CMake/releases/download/v3.24.1/cmake-3.24.1-Linux-x86_64.sh -O /tmp/cmake-install.sh && \ chmod +x /tmp/cmake-install.sh && \ /tmp/cmake-install.sh --skip-license -- prefix=/usr/local && \ rm /tmp/cmake-install.sh && \ ln -sf /usr/local/bin/cmake /usr/bin/cmake # Clone h3-pg repository RUN git clone https://github.com/zachasme/h3-pg.git # Replace the FindPostgreSQL.cmake with the custom version COPY FindPostgreSQL.cmake /h3-pg/cmake/FindPostgreSQL.cmake # Build h3-pg RUN cd h3-pg && \ cmake -B build -DCMAKE_BUILD_TYPE=Release -DPostgreSQL_FIND_VERSION=14 && \ cmake --build build && \ cmake --install build --component h3-pg # Clone duckdb_fdw repository RUN git clone https://github.com/alitrack/duckdb_fdw # Adjust the path to pg_config as necessary ENV PG_CONFIG=/usr/lib/postgresql/14/bin/pg_config # duckdb foreign data wrapper installation RUN cd duckdb_fdw && \ wget -c https://github.com/duckdb/duckdb/releases/download/v0.10.1/libduckdb-linux-amd64.zip && \ unzip -o libduckdb-linux-amd64.zip && \ cp libduckdb.so $($PG_CONFIG --libdir) && \ make USE_PGXS=1 PG_CONFIG=$PG_CONFIG && \ make install USE_PGXS=1 PG_CONFIG=$PG_CONFIG # Install pointcloud extension RUN apt-get update \ && apt-get install -y --no-install-recommends \ git \ ca-certificates \ build-essential \ autoconf \ automake \ cmake \ zlib1g-dev \ postgresql-server-dev-all \ libxml2-dev \ && rm -rf /var/lib/apt/lists/* \ && git clone https://github.com/verma/laz-perf.git \ && cd laz-perf \ && cmake . \ && make \ && make install \ && cd .. \ && rm -r laz-perf \ && git clone https://github.com/pgpointcloud/pointcloud \ && cd pointcloud \ && ./autogen.sh \ && ./configure --with-lazperf=/usr/local --with-pgconfig=/usr/lib/postgresql/${POSTGRES_VERSION}/bin/pg_config CFLAGS="-Wall -Werror -O2 -g" \ && make \ && make install \ && apt-get purge -y --auto-remove \ git \ ca-certificates \ build-essential \ autoconf \ automake \ cmake \ zlib1g-dev \ postgresql-server-dev-all \ libxml2-dev # Final Cleanup RUN apt-get purge -y --auto-remove wget lsb-release && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* # Change the uid of postgres to 26 RUN usermod -u 26 postgres # Set user to postgres USER 26 ```