Closed ronert closed 10 years ago
Ronert, we don't support Greenplum. It would compile if the Greenplum was based on 8.4 (that last postgres version we support) but it was based on 8.2. :(
If you want to use this extension, you need to hack the code to replace the DefineCustomXXXVariable functions.
Hi Euler,
thanks for your quick reply! I will try and hack the DefineCustom functions, but I don’t know if my skills are up to it yet. Shame that the versions don’t match up – I was really excited when I found this library :(
Best Ronert
Hi, Ronert! For Greenplum you can: a. Download the 0.0.19 version from here: http://pgfoundry.org/projects/pgsimilarity/ b. Build it with Greenplum using this makefile:
# $PostgreSQL $
MODULE_big = pg_similarity
OBJS = tokenizer.o similarity.o \
block.o cosine.o dice.o euclidean.o hamming.o jaccard.o \
jaro.o levenshtein.o matching.o mongeelkan.o needlemanwunsch.o \
overlap.o qgram.o smithwaterman.o smithwatermangotoh.o
INCLUDEDIRS := -I.
INCLUDEDIRS += -I$(shell pg_config --includedir-server)
INCLUDEDIRS += -I$(shell pg_config --includedir)
INCLUDEDIRS += -I /usr/local/greenplum-db/include/postgresql/internal
INCLUDEDIRS += -I /usr/local/greenplum-db/include/postgresql/informix
INCLUDEDIRS += -I /usr/local/greenplum-db/include/postgresql/server/utils
LIBDIR = -L$(shell pg_config --libdir)
PGXS = $(shell pg_config --pgxs)
include $(PGXS)
%.o: %.c %.h
cc -fpic -o $@ -c $< $(INCLUDEDIRS)
pg_similarity : $(OBJS)
cc -shared -o pg_similarity $(OBJS) $(LIBDIR) -lpq -lm
c. Copy the built object "pg_similarity" to all the libdir on all the segments (their locations can be get from pg_config --libdir
command), and I had to put it in postrgesql subfolder (for me it was /usr/local/greenplum-db/lib/postgresql
)
d. Run the psql -d <your_database> -f pg_similarity.sql
to install it to your database
Hi,
that seems to do the trick! Thanks so much for your help – greatly appreciated. Maybe Señor Euler can include this in the README?
Best Ronert
It looked like it worked, but now I am getting the following error when using pg_similarity functions:
NOTICE: 00000: Releasing segworker groups to finish aborting the transaction.
LOCATION: rollbackDtxTransaction, cdbtm.c:1207
ERROR: 58M01: Error on receive from seg0 localhost.localdomain:40000 pid=17244: server closed the connection unexpectedly
DETAIL:
This probably means the server terminated abnormally
before or while processing the request.
LOCATION: cdbdisp_finishCommand, cdbdisp.c:1476
ERROR: XX000: could not temporarily connect to one or more segments (cdbgang.c:1626)
LOCATION: allocateWriterGang, cdbgang.c:1626
Any idea?
Hi! What kind of query did you issue? What is your GP version and configuration? Did the shared object compiled without errors (for me it's size is 280kb and md5 is cecc7b133724a669559f3dbf7061d492)
Here's what I have on my VM:
test=# create table test (a varchar, b varchar);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
test=# insert into test (a,b) values ('foo 123', 'bar 123'), ('a b c', 'a b c d'), ('1 2 3', null);
INSERT 0 3
test=# select a,b,cosine(a,b) from test;
a | b | cosine
---------+---------+----------
foo 123 | bar 123 | 0.5
a b c | a b c d | 0.866025
1 2 3 | |
(3 rows)
test=# select a,b,lev(a,b) from test;
a | b | lev
---------+---------+----------
foo 123 | bar 123 | 0.571429
a b c | a b c d | 0.714286
1 2 3 | |
(3 rows)
The example compiled without errors after I turned off the compiler error flags as described in https://gopivotal-com.socialcast.com/search?utf8=%E2%9C%93&q=pg_similarity.
I am running GPDB 4.2.6.3 on a DCA.
Your minimal example works, but whenever I run it on my dataset it throws errors on the segments. Is it possible that the functions can’t handle UTF-8 strings and just crash?
I am going to contact you on Pivotal internal channels, maybe you can have a look at the DCA if you have some time.
Thanks a lot.
ok, let's continue there
Hi eulerto,
thanks first of all for the cool library. I am having trouble compiling it on CentOS 6.4. Please see the following output:
[gpadmin@gpdbsne pg_similarity]$ USE_PGXS=1 make sed 's,MODULE_PATHNAME,$libdir/pg_similarity,g' pg_similarity.sql.in >pg_similarity.sql gcc -m64 -O3 -funroll-loops -fargument-noalias-global -fno-omit-frame-pointer -g -finline-limit=1800 -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -I/data/espine1/dev/tools/curl/7.21.7/dist/rhel5_x86_64/include -Werror -fpic -I. -I/usr/local/greenplum-db-4.2.5.1/include/postgresql/server -I/usr/local/greenplum-db-4.2.5.1/include/postgresql/internal -D_GNU_SOURCE -I/data/home/build/builds/greenplum-db/Release-4_2_5_1-build-1_rc_042513-1306/Release-4_2_5_1-build-1_src/ext/rhel5_x86_64/include -I/data/home/build/builds/greenplum-db/Release-4_2_5_1-build-1_rc_042513-1306/Release-4_2_5_1-build-1_src/ext/rhel5_x86_64/include/libxml2 -c -o tokenizer.o tokenizer.c gcc -m64 -O3 -funroll-loops -fargument-noalias-global -fno-omit-frame-pointer -g -finline-limit=1800 -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -I/data/espine1/dev/tools/curl/7.21.7/dist/rhel5_x86_64/include -Werror -fpic -I. -I/usr/local/greenplum-db-4.2.5.1/include/postgresql/server -I/usr/local/greenplum-db-4.2.5.1/include/postgresql/internal -D_GNU_SOURCE -I/data/home/build/builds/greenplum-db/Release-4_2_5_1-build-1_rc_042513-1306/Release-4_2_5_1-build-1_src/ext/rhel5_x86_64/include -I/data/home/build/builds/greenplum-db/Release-4_2_5_1-build-1_rc_042513-1306/Release-4_2_5_1-build-1_src/ext/rhel5_x86_64/include/libxml2 -c -o similarity.o similarity.c similarity.c: In function ‘_PG_init’: similarity.c:135: error: array type has incomplete element type similarity.c:142: error: array type has incomplete element type cc1: warnings being treated as errors similarity.c:148: error: implicit declaration of function ‘DefineCustomEnumVariable’ similarity.c:174: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:174: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:186: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:186: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:215: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:215: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:227: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:227: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:256: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:256: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:268: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:268: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:297: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:297: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:309: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:309: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:325: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:325: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:337: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:337: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:366: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:366: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:378: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:378: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:394: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:394: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:406: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:406: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:422: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:422: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:434: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:434: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:450: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:450: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:462: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:462: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:491: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:491: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:503: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:503: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:532: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:532: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:544: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:544: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:560: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:560: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:572: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:572: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:586: error: large integer implicitly truncated to unsigned type similarity.c:586: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:586: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:615: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:615: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:627: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:627: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:656: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:656: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:668: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:668: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:684: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:684: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:696: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:696: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:712: error: passing argument 8 of ‘DefineCustomRealVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:359: note: expected ‘GucRealAssignHook’ but argument is of type ‘int’ similarity.c:712: error: too many arguments to function ‘DefineCustomRealVariable’ similarity.c:724: error: passing argument 6 of ‘DefineCustomBoolVariable’ makes pointer from integer without a cast /usr/local/greenplum-db-4.2.5.1/include/postgresql/server/utils/guc.h:339: note: expected ‘GucBoolAssignHook’ but argument is of type ‘int’ similarity.c:724: error: too many arguments to function ‘DefineCustomBoolVariable’ similarity.c:142: error: unused variable ‘pgs_gram_options’ similarity.c:135: error: unused variable ‘pgs_tokenizer_options’ make: *\ [similarity.o] Error 1
Any help would be greatly appreciated!
Thanks and best regards Ronert