alitrack / duckdb_fdw

DuckDB Foreign Data Wrapper for PostgreSQL
MIT License
337 stars 21 forks source link

"failed to open SQLite DB" when it's a DuckDB file (file permissions? USER MAPPING?) #55

Open houstonhaynes opened 2 months ago

houstonhaynes commented 2 months ago

I have two users on a database that's essentially just a set of foreign tables into a duckdb DB file. When I run a query against the postgres endpoint I get the following error in the log

cmsadmin@medptd ERROR: failed to open SQLite DB. rc=1 path=/home/postgres/medptd.db

The 'cmsadmin' is salient here because they are the user that "owns" this database. When I go into a psql prompt as the cmsadmin user on the DB I get the same error.

cmsadmin@cmspsql:~$ psql -d medptd
psql (16.4 (Debian 16.4-1.pgdg120+1))
Type "help" for help.

medptd=# SELECT COUNT(*) FROM cms_view;
ERROR:  failed to open SQLite DB. rc=1 path=/home/postgres/medptd.db
medptd=#

So when I look at this file I see that the cmsadmin group is the "owner" of the file

cmsadmin@cmspsql:~$ ls -lh /home/postgres
total 27G
-rwxr-xr-x 1 cmsadmin cmsadmin 27G Sep  7 09:28 medptd.db

And when I run duckdb as either postgres or cmsadmin I have no issue. (added both user to the "cmsadmin" group which owns the folder and file)

image

So I'm down to a few potential issues I wanted to ask here. When setting up the fdw there's no mention of user mapping. I'm wondering if that matters here.

CREATE USER MAPPING FOR cmsadmin SERVER duckdb_server;

Does this register as meaningful to anyone here? Is there anything else that should be considered when setting up duckdb_server in the postgres instance? I feel like there's some detail I'm missing in understanding how the extension accesses the .db file.

Thanks in advance to anyone that can shed light on this issue.

houstonhaynes commented 2 months ago

Well I just tried rebuilding the foreign table set after mapping both users to the server.

I got some interesting results - including a return of data in JetBrains Rider that looked like the data was coming back but then the postgres instance gave an error back a few seconds after the data was returned in the SQL Console view.

[2024-09-09 12:13:08] Connected
medptd.public> set search_path = "public"
[2024-09-09 12:13:09] completed in 72 ms
medptd.public> SELECT
                   EXTRACT(Year FROM Year) as Year,
                   ROUND(AVG(Total_Drug_Cost_Per_Day), 2) as Avg_Total_Drug_Cost_Per_Day,
                   ROUND(AVG(Over_65_Total_Drug_Cost_Per_Day), 2) as Over_65_Avg_Total_Drug_Cost_Per_Day
               FROM
                   cms_view
               WHERE
                   EXTRACT(Year FROM Year) = 2017
               GROUP BY
                   YEAR,
                   Total_Drug_Cost_Per_Day,
                   Over_65_Total_Drug_Cost_Per_Day
               ORDER BY AVG_Total_Drug_Cost_Per_Day DESC
[2024-09-09 12:14:13] An I/O error occurred while sending to the backend.
[2024-09-09 12:14:13] Unrecoverable error received, the connection is spoiled

image

Here's what the tail on the log has to say

postgres@cmspsql:~$ sudo tail -f /var/log/postgresql/postgresql-16-main.log
2024-09-09 16:14:15.665 UTC [5212] LOG:  terminating any other active server processes
2024-09-09 16:14:15.716 UTC [5212] LOG:  all server processes terminated; reinitializing
2024-09-09 16:14:15.730 UTC [5317] LOG:  database system was interrupted; last known up at 2024-09-09 16:12:47 UTC
2024-09-09 16:14:15.764 UTC [5317] LOG:  database system was not properly shut down; automatic recovery in progress
2024-09-09 16:14:15.767 UTC [5317] LOG:  redo starts at 0/1F071F0
2024-09-09 16:14:15.767 UTC [5317] LOG:  invalid record length at 0/1F07250: expected at least 24, got 0
2024-09-09 16:14:15.767 UTC [5317] LOG:  redo done at 0/1F07218 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-09-09 16:14:15.770 UTC [5318] LOG:  checkpoint starting: end-of-recovery immediate wait
2024-09-09 16:14:15.779 UTC [5318] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.003 s, sync=0.001 s, total=0.011 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/1F07250, redo lsn=0/1F07250
2024-09-09 16:14:15.783 UTC [5212] LOG:  database system is ready to accept connections
2024-09-09 16:19:10.204 UTC [5347] postgres@medptd ERROR:  failed to open SQLite DB. rc=1 path=/home/postgres/medptd.db
2024-09-09 16:19:10.204 UTC [5347] postgres@medptd STATEMENT:  SELECT
            EXTRACT(Year FROM Year) as Year,
            ROUND(AVG(Total_Drug_Cost_Per_Day), 2) as Avg_Total_Drug_Cost_Per_Day,
            ROUND(AVG(Over_65_Total_Drug_Cost_Per_Day), 2) as Over_65_Avg_Total_Drug_Cost_Per_Day
        FROM
            cms_view
        WHERE
            EXTRACT(Year FROM Year) = 2017
        GROUP BY
            YEAR,
            Total_Drug_Cost_Per_Day,
            Over_65_Total_Drug_Cost_Per_Day
        ORDER BY AVG_Total_Drug_Cost_Per_Day DESC

[tail still active and not further info when sending other queries]

The good news is that when I restart the server and query for row counts on the main view from either user the queries return a good value,.

postgres@cmspsql:~$ psql -d medptd;
psql (16.4 (Debian 16.4-1.pgdg120+1))
Type "help" for help.

medptd=# SELECT COUNT(*) from cms_view;
   count
-----------
 249489396
(1 row)

medptd=# exit
postgres@cmspsql:~$ su - cmsadmin
Password:
cmsadmin@cmspsql:~$ psql -d medptd
psql (16.4 (Debian 16.4-1.pgdg120+1))
Type "help" for help.

medptd=# SELECT COUNT(*) FROM cms_view;
   count
-----------
 249489396
(1 row)

image

HOWEVER when running a query that has some calculations in it postgres hangs.

medptd=# SELECT
    EXTRACT(Year FROM Year) as Year,
    ROUND(AVG(Total_Drug_Cost_Per_Day), 2) as Avg_Total_Drug_Cost_Per_Day,
    ROUND(AVG(Over_65_Total_Drug_Cost_Per_Day), 2) as Over_65_Avg_Total_Drug_Cost_Per_Day
FROM
    cms_view
WHERE
    EXTRACT(Year FROM Year) = 2017
GROUP BY
    YEAR,
    Total_Drug_Cost_Per_Day,
    Over_65_Total_Drug_Cost_Per_Day
ORDER BY AVG_Total_Drug_Cost_Per_Day DESC;
ERROR:  failed to open SQLite DB. rc=1 path=/home/postgres/medptd.db
houstonhaynes commented 2 months ago

And just to round out the picture - the same "single view" query with aggregates runs without issue against the view in the DuckDB CLI.

Screenshot 2024-09-09 124521

Screenshot 2024-09-09 124408