allenai / ike

Build tables of information by extracting facts from indexed text corpora via a simple and effective query language.
http://allenai.org/software/interactive-knowledge-extraction/
Apache License 2.0
56 stars 20 forks source link

[CLOSED] Improves query suggestion performance #154

Closed sbhaktha closed 8 years ago

sbhaktha commented 8 years ago

Issue by chrisc36 Fri Jun 19 20:18:28 2015 Originally opened as https://github.com/allenai/okcorpus/pull/152


This adds two important optimizations: 1: Parallelized fetch/processing of examples 2: Optimized query strategy for gettings labelled samples for one column tables

Also contains a couple of minor bufixes / improvement to the default settings


chrisc36 included the following code: https://github.com/allenai/okcorpus/pull/152/commits

sbhaktha commented 8 years ago

Comment by chrisc36 Fri Jun 19 20:55:42 2015


This now also allows you to generalize capitalized words

sbhaktha commented 8 years ago

Comment by dirkgr Fri Jun 19 22:06:40 2015


I'm assuming you'll assign this to me when it's ready?

sbhaktha commented 8 years ago

Comment by chrisc36 Fri Jun 19 23:59:51 2015


Yes, nearly done.

sbhaktha commented 8 years ago

Comment by dirkgr Mon Jun 22 22:44:04 2015


I broke this.

  1. Import the "eating" and the "eats" table from https://gist.github.com/dirkgr/15bf719ae354579f8025 and https://gist.github.com/dirkgr/c2e5dafb7818d6772601.
  2. Search for ((?:NP PP|ADJP)* NN+) $eats ((?:NP PP|ADJP)* NN+) over all seven corpora, while on the "eating" table.
  3. Click "Refresh"

It gives me an ArrayIndexOutOfBoundsException.

sbhaktha commented 8 years ago

Comment by chrisc36 Tue Jun 23 18:26:53 2015


Bug is fixed, but that query revealed a sneakier bug in how samples are gathered that will take a bit more effort to work around

sbhaktha commented 8 years ago

Comment by chrisc36 Tue Jun 23 20:51:10 2015


Should be good to go, we don't get broadening suggestions for "((?:NP PP|ADJP)* NN+) $eats ((?:NP PP|ADJP)* NN+)" but we at least get correct statistics and no errors.

sbhaktha commented 8 years ago

Comment by dirkgr Tue Jun 23 21:53:47 2015


It duplicates the (?: ... ) groups, every time I ask for a suggestion. Is that harmful?

sbhaktha commented 8 years ago

Comment by dirkgr Tue Jun 23 22:10:35 2015


Other than that one comment, LGTM.

sbhaktha commented 8 years ago

Comment by chrisc36 Wed Jun 24 17:22:20 2015


Fixed the (?: ... ) being replicated, I will go ahead and merge

sbhaktha commented 8 years ago

Comment by dirkgr Wed Jun 24 17:26:05 2015


Yay!

Let me know when I should deploy.

On June 24, 2015 at 10:22:21, Christopher Clark (notifications@github.com) wrote:

Fixed the (?: ... ) being replicated, I will go ahead and merge

— Reply to this email directly or view it on GitHub.