Segfault-Inc / Multicorn

Data Access Library
https://multicorn.org/
PostgreSQL License
700 stars 145 forks source link

the return of 'Plpy is not a built-int module' #247

Closed Remi-C closed 4 years ago

Remi-C commented 4 years ago

Hello dear Multicorn team and @rdunklau , it appears that using latest (compiled as of yesterday) multicorn + latest postgres (pg12) + python 3.X produces again an error in some situations: This seem to be similar to #136

As I know you are busy, I prepared a fully self contained test to reproduce the problem:

CREATE SCHEMA bug_multicorn_within_python ; 

-- a simple python function
DROP FUNCTION IF EXISTS bug_multicorn_within_python.basic_python_function(useless_input text) ; 
CREATE OR REPLACE FUNCTION bug_multicorn_within_python.basic_python_function(useless_input text) 
RETURNS TABLE (ordinality int, some_text TEXT , some_json json[] ) LANGUAGE plpython3u STABLE PARALLEL UNSAFE ROWS 3 AS $$
import json
ret = [
    (1,'bla', None)
    , (2,useless_input, None)
]
return (ret)
$$;
    -- this function works fine
    SELECt * 
    FROM bug_multicorn_within_python.basic_python_function('test input') ; 

-- now creating basic multicorn fdw to read postgres logs
    CREATE SERVER IF NOT EXISTS filesystem_multicorn_fdw foreign data wrapper multicorn options ( wrapper 'multicorn.fsfdw.FilesystemFdw' );   
    DROP FOREIGN TABLE IF EXISTS bug_multicorn_within_python.basic_multicorn_fdw;
    CREATE FOREIGN TABLE IF NOT EXISTS bug_multicorn_within_python.basic_multicorn_fdw ( 
        pg_version TEXT,
        log_nb text,
        file_content bytea,
        filename TEXT
    ) server filesystem_multicorn_fdw options(
        root_dir    '/var/log/postgresql',
        pattern     'postgresql-{pg_version}-main.log.{log_nb}',
        content_column 'file_content',
        filename_column 'filename') ;
-- OK : we can see the logs, we can filter , fantastic
SELECt *
   FROM bug_multicorn_within_python.basic_multicorn_fdw 
    WHERE log_nb = '1' and pg_version = '12';

-- we can also combine both plpython function and multicorn in one function, no errors.
   SELECt *
   FROM bug_multicorn_within_python.basic_multicorn_fdw  as multi
    , encode(file_content, 'escape') as log_file_content_in_text
    , bug_multicorn_within_python.basic_python_function(log_file_content_in_text) as py
    WHERE log_nb = '1' and pg_version = '12';

-- Now, disconnect and reconnect, and do not run any plpython functions ! 
-- then use both multicorn and plpy in the same query (query sequence does not matter)
   SELECt *
   FROM bug_multicorn_within_python.basic_multicorn_fdw  as multi
    , encode(file_content, 'escape') as log_file_content_in_text
    , bug_multicorn_within_python.basic_python_function(log_file_content_in_text) as py
    WHERE log_nb = '1' and pg_version = '12';
 /*
  * SQL Error [38000]: ERROR: could not import "__main__" module
  Detail: ImportError: 'plpy' is not a built-in module
  */

I don't understand how multicorn's python interacts with postgres's python, so it's hard to offer ideas about solving this issue.

rdunklau commented 4 years ago

Hello,

I'm currently working on setting up a CI system for Multicorn, and encountered the same bug. Either you use multicorn first, and then the python interpreter for plpython is already initialized but without having imported the plpy module in builtins. Or you use the python interpreter first, and in that case multicorn can't be imported. I think that happens if the python used by plpython and multicorn are different.

I'm working on trying to understand what happens, and if something has changed between versions.

I'll let you know.

Remi-C commented 4 years ago

Thanks @rdunklau ! Here is the code to find which python plpython is using :

DROP FUNCTION IF EXISTS bug_multicorn_within_python.get_python_interpreter_version() ; 
CREATE OR REPLACE FUNCTION bug_multicorn_within_python.get_python_interpreter_version( ) 
RETURNS text LANGUAGE plpython3u STABLE PARALLEL UNSAFE AS $$
import os, sys
return os.popen('type python').read() + sys.version
$$; 
SELECT bug_multicorn_within_python.get_python_interpreter_version() ;
-- python is /usr/bin/python
-- 3.6.9 (default, Nov  7 2019, 10:44:02) 
-- [GCC 8.3.0]

I don't know how to check for multicorn python version. Maybe an option when I compile it ?

(update) So when building multicorn, it uses : /usr/include/python3.6m /usr/bin/clang-6.0 -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/usr/include/python3.6m -I. -I./ -I/usr/include/postgresql/12/server -I/usr/include/postgresql/internal -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -flto=thin -emit-llvm -c -o src/errors.bc src/errors.c

In /usr/includes, I have 3 pythons python2.7/ python3.6/ python3.6m/

rdunklau commented 4 years ago

Hello.

I just pushed merged a branch fixing that among other things in master, could you check that everything is in order ?

Remi-C commented 4 years ago

Hey @rdunklau , I have good news for you ! I pulled and compiled the latest multicorn on the master. I ran several times the testing code above, and the bug seems to have been fixed. Thank you very much for your quick fix, and well done !