HazyResearch / mindbender

Tools for iterative knowledge base development with DeepDive
116 stars 32 forks source link

MB Search relies on Postgres 9.3+ for to_json (so PGXL and Greenplum are out of luck) #49

Open alldefector opened 8 years ago

alldefector commented 8 years ago

Learned from @netj that MB Search works only with Postgresql 9.3 (and not PGXL or Greenplum) because it uses the to_json function.

How hard is it to drop the requirement for to_json? Or at least ship a basic version of it. E.g., the following could be used to patch PGXL: http://www.pgxn.org/dist/json_enhancements/doc/json_enhancements.html https://bitbucket.org/IVC-Inc/json_enhancements/overview

chrismre commented 8 years ago

my two cents: not hard.

On Tue, Sep 22, 2015 at 6:03 PM alldefector notifications@github.com wrote:

Learned from @netj https://github.com/netj that MB Search works only with Postgresql 9.3 (and not PGXL or Greenplum) because it uses the to_json function.

How hard is it to drop the requirement for to_json? Or at least ship a basic version of it. E.g., the following could be used to patch PGXL: http://www.pgxn.org/dist/json_enhancements/doc/json_enhancements.html https://bitbucket.org/IVC-Inc/json_enhancements/overview

— Reply to this email directly or view it on GitHub https://github.com/HazyResearch/mindbender/issues/49.

alldefector commented 8 years ago

OK, I managed to install the backport on PGXL, with the help of these posts: http://sourceforge.net/p/postgres-xc/mailman/message/31914373/ http://sourceforge.net/p/postgres-xl/mailman/message/32623527/

Steps:

  1. sudo apt-get install pgxnclient
  2. pgxn install json_enhancements (run pg_config first to make sure it prints PGXL info not vanilla PG)
  3. Edit $PGXL_DIR/share/postgresql/extension/json_enhancements--1.0.0.sql to remove the last CREATE AGGREGATE statement.
  4. Run CREATE EXTENSION json_enhancements in PGXL.
  5. Enjoy!

@raphaelhoffmann @SenWu @netj

netj commented 8 years ago

@alldefector Awesome! I had to sometimes do a CREATE EXTENSION hstore before step 4, but to_json seems working great. I hope PGXL doesn't break on other parts of the SQL generated by MB Search.

netj commented 8 years ago

Btw the SQL produced by MB for loading into ES can be seen with:

mindbender hack jqDDlog '"has_spouse_candidates" | relationByName | sqlForRelationNestingAssociated' <(ddlog export-schema app.ddlog)

Just replace the first "has_spouse_candidates" with the DDlog relation name.

syadlowsky commented 8 years ago

Is there any progress on solving this for greenplum?

alldefector commented 8 years ago

Not yet...

deepdive sql has a fallback python script to do the formatting: https://github.com/HazyResearch/deepdive/blob/13e8788b9bf39e2d3aa83d0883a6391c9c9d6592/database/db-driver/postgresql/db-query#L42 https://github.com/HazyResearch/deepdive/blob/13e8788b9bf39e2d3aa83d0883a6391c9c9d6592/database/pgtsv_to_json

However, I think it may not be reliable with long values (when I tried half a year ago). Any help to improve it is much appreciated :)