eulerto / pg_similarity

set of functions and operators for executing similarity queries
BSD 3-Clause "New" or "Revised" License
363 stars 39 forks source link

Large number of errors when compiling on CentOS #8

Closed ronert closed 10 years ago

ronert commented 10 years ago

Hi eulerto,

thanks first of all for the cool library. I am having trouble compiling it on CentOS 6.4. Please see the following output:

[gpadmin@gpdbsne pg_similarity]$ USE_PGXS=1 make sed 's,MODULE_PATHNAME,$libdir/pg_similarity,g' pg_similarity.sql.in >pg_similarity.sql gcc -m64 -O3 -funroll-loops -fargument-noalias-global -fno-omit-frame-pointer -g -finline-limit=1800 -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -I/data/espine1/dev/tools/curl/7.21.7/dist/rhel5_x86_64/include -Werror -fpic -I. -I/usr/local/greenplum-db-4.2.5.1/include/postgresql/server -I/usr/local/greenplum-db-4.2.5.1/include/postgresql/internal -D_GNU_SOURCE -I/data/home/build/builds/greenplum-db/Release-4_2_5_1-build-1_rc_042513-1306/Release-4_2_5_1-build-1_src/ext/rhel5_x86_64/include -I/data/home/build/builds/greenplum-db/Release-4_2_5_1-build-1_rc_042513-1306/Release-4_2_5_1-build-1_src/ext/rhel5_x86_64/include/libxml2 -c -o tokenizer.o tokenizer.c gcc -m64 -O3 -funroll-loops -fargument-noalias-global -fno-omit-frame-pointer -g -finline-limit=1800 -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -I/data/espine1/dev/tools/curl/7.21.7/dist/rhel5_x86_64/include -Werror -fpic -I. -I/usr/local/greenplum-db-4.2.5.1/include/postgresql/server -I/usr/local/greenplum-db-4.2.5.1/include/postgresql/internal -D_GNU_SOURCE -I/data/home/build/builds/greenplum-db/Release-4_2_5_1-build-1_rc_042513-1306/Release-4_2_5_1-build-1_src/ext/rhel5_x86_64/include -I/data/home/build/builds/greenplum-db/Release-4_2_5_1-build-1_rc_042513-1306/Release-4_2_5_1-build-1_src/ext/rhel5_x86_64/include/libxml2 -c -o similarity.o similarity.c similarity.c: In function ‘_PG_init’: similarity.c:135: error: array type has incomplete element type similarity.c:142: error: array type has incomplete element type cc1: warnings being treated as errors similarity.c:148: error: implicit declaration of function ‘DefineCustomEnumVariable’ similarity.c:174: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:174: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:186: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:186: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:215: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:215: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:227: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:227: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:256: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:256: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:268: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:268: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:297: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:297: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:309: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:309: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:325: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:325: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:337: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:337: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:366: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:366: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:378: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:378: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:394: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:394: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:406: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:406: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:422: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:422: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:434: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:434: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:450: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:450: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:462: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:462: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:491: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:491: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:503: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:503: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:532: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:532: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:544: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:544: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:560: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:560: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:572: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:572: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:586: error: large integer implicitly truncated to unsigned type similarity.c:586: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:586: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:615: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:615: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:627: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:627: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:656: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:656: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:668: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:668: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:684: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:684: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:696: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:696: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:712: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:712: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:724: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:724: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:142: error: unused variable ‘pgs_gram_options’ similarity.c:135: error: unused variable ‘pgs_tokenizer_options’ make: *\ [similarity.o] Error 1

Any help would be greatly appreciated!

Thanks and best regards Ronert

eulerto commented 10 years ago

Ronert, we don't support Greenplum. It would compile if the Greenplum was based on 8.4 (that last postgres version we support) but it was based on 8.2. :(

If you want to use this extension, you need to hack the code to replace the DefineCustomXXXVariable functions.

ronert commented 10 years ago

Hi Euler,

thanks for your quick reply! I will try and hack the DefineCustom functions, but I don’t know if my skills are up to it yet. Shame that the versions don’t match up – I was really excited when I found this library :(

Best Ronert

0x0FFF commented 10 years ago

Hi, Ronert! For Greenplum you can: a. Download the 0.0.19 version from here: http://pgfoundry.org/projects/pgsimilarity/ b. Build it with Greenplum using this makefile:

# $PostgreSQL $

MODULE_big = pg_similarity
OBJS = tokenizer.o similarity.o \
       block.o cosine.o dice.o euclidean.o hamming.o jaccard.o \
       jaro.o levenshtein.o matching.o mongeelkan.o needlemanwunsch.o \
       overlap.o qgram.o smithwaterman.o smithwatermangotoh.o
INCLUDEDIRS := -I.
INCLUDEDIRS += -I$(shell pg_config --includedir-server)
INCLUDEDIRS += -I$(shell pg_config --includedir)
INCLUDEDIRS += -I /usr/local/greenplum-db/include/postgresql/internal
INCLUDEDIRS += -I /usr/local/greenplum-db/include/postgresql/informix
INCLUDEDIRS += -I /usr/local/greenplum-db/include/postgresql/server/utils
LIBDIR = -L$(shell pg_config --libdir)
PGXS = $(shell pg_config --pgxs)
include $(PGXS) 

%.o: %.c %.h
    cc -fpic -o $@ -c $< $(INCLUDEDIRS)

pg_similarity : $(OBJS)
    cc -shared -o pg_similarity $(OBJS) $(LIBDIR) -lpq -lm

c. Copy the built object "pg_similarity" to all the libdir on all the segments (their locations can be get from pg_config --libdir command), and I had to put it in postrgesql subfolder (for me it was /usr/local/greenplum-db/lib/postgresql) d. Run the psql -d <your_database> -f pg_similarity.sql to install it to your database

ronert commented 10 years ago

Hi,

that seems to do the trick! Thanks so much for your help – greatly appreciated. Maybe Señor Euler can include this in the README?

Best Ronert

ronert commented 10 years ago

It looked like it worked, but now I am getting the following error when using pg_similarity functions:

NOTICE: 00000: Releasing segworker groups to finish aborting the transaction. LOCATION: rollbackDtxTransaction, cdbtm.c:1207 ERROR: 58M01: Error on receive from seg0 localhost.localdomain:40000 pid=17244: server closed the connection unexpectedly DETAIL:
This probably means the server terminated abnormally before or while processing the request. LOCATION: cdbdisp_finishCommand, cdbdisp.c:1476 ERROR: XX000: could not temporarily connect to one or more segments (cdbgang.c:1626) LOCATION: allocateWriterGang, cdbgang.c:1626

Any idea?

0x0FFF commented 10 years ago

Hi! What kind of query did you issue? What is your GP version and configuration? Did the shared object compiled without errors (for me it's size is 280kb and md5 is cecc7b133724a669559f3dbf7061d492)

Here's what I have on my VM:

test=# create table test (a varchar, b varchar);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
test=# insert into test (a,b) values ('foo 123', 'bar 123'), ('a b c', 'a b c d'), ('1 2 3', null);
INSERT 0 3
test=# select a,b,cosine(a,b) from test;
    a    |    b    |  cosine  
---------+---------+----------
 foo 123 | bar 123 |      0.5
 a b c   | a b c d | 0.866025
 1 2 3   |         |         
(3 rows)

test=# select a,b,lev(a,b) from test;
    a    |    b    |   lev    
---------+---------+----------
 foo 123 | bar 123 | 0.571429
 a b c   | a b c d | 0.714286
 1 2 3   |         |         
(3 rows)
ronert commented 10 years ago

The example compiled without errors after I turned off the compiler error flags as described in https://gopivotal-com.socialcast.com/search?utf8=%E2%9C%93&q=pg_similarity.

I am running GPDB 4.2.6.3 on a DCA.

Your minimal example works, but whenever I run it on my dataset it throws errors on the segments. Is it possible that the functions can’t handle UTF-8 strings and just crash?

I am going to contact you on Pivotal internal channels, maybe you can have a look at the DCA if you have some time.

Thanks a lot.

0x0FFF commented 10 years ago

ok, let's continue there