HazyResearch / mindbender

Tools for iterative knowledge base development with DeepDive
116 stars 32 forks source link

How to trace source sentence with multiple keys? #47

Open senwu opened 9 years ago

senwu commented 9 years ago

I want to trace the source sentence for my extraction based on multiple keys, and I write something like:

@source sentences( @key doc_id text, @key section_id text, ref_doc_id text, @key sent_id int, @searchable words text[], lemmas text[], poses text[], ners text[], dep_paths text[], dep_parents int[]).

@extraction gene_mentions( id bigint, @references(relation="sentences", column="doc_id", alias="sent_gene") doc_id text, @references(relation="sentences", column="section_id", alias="sent_gene") section_id text, @references(relation="sentences", column="sent_id", alias="sent_gene") sent_id int, wordidxs int[], @key mention_id text, supertype text, subtype text, @searchable entity text, @searchable words text[], @navigable is_correct boolean).

But it doesn't work, am I missing something? How can I use multiple keys to find the source sentence?

netj commented 9 years ago

That's the intended correct way to do so. I haven't tested multi key schema seriously so there may be some bug. Could you be more specific at which step it doesn't work? Does the data seem to get into index? You don't get back search results? Or see source or parent from the result but not rendered?

senwu commented 9 years ago

I got something in parents, {
_index: "genomics", _type: "gene_mentions", _id: "23383340_Body.0_109_14_ENSG00000170571:EMB_CANONICAL_SYMBOL", _score: 1, _source: {
wordidxs: [...], subtype: null, is_correct: true, doc_id: "23383340", sent_id: 109, supertype: "CANONICAL_TRUE", id: null, section_id: "Body.0", entity: "ENSG00000170571:EMB", words: [...], mention_id: "23383340_Body.0_109_14_ENSG00000170571:EMB_CANONICAL_SYMBOL" }, parent: {
_index: "genomics", _type: "sentences", _id: "23383340@Body.0@109", found: false } }

And the sentence exists (doc_id = '23383340', section_id = 'Body.0', and sent_id = '109'), but it doesn't render in the website. FYI: If I change source.words to extraction.doc_id, it appears.

Am I setting something wrong?