Rothamsted / graphql-api

KnetMiner Platform API BETA
Apache License 2.0
1 stars 1 forks source link

Full-text search using Neo4j and Lucene #22

Open KeywanHP opened 3 years ago

KeywanHP commented 3 years ago

We want to test the ability of the Neo4j-GraphQL library to do a full-text search against the KnetMiner KG. We can start with searching for keywords (using Lucene syntax) in nodes of type Publication and in fields Abstract and AbstractHeader. Ultimately, the search will need to be extended to return any nodes and any fields that match a keyword. Information about the Neo4j full-text search can be found here: https://neo4j.com/docs/cypher-manual/current/indexes-for-full-text-search/

Test cases Neo4j: http://knetminer-wheat.cyverseuk.org:7474/browser/ Keyword 1: drought Keyword 2: trehalose Keyword 3: drought AND trehalose Keyword 4: drought OR trehalose

Response: The preName of all nodes matching the keywords in one of their fields and the Lucene search score.

marco-brandizi commented 3 years ago

These figures don't make sense, we should see milliseconds, not almost a minute. There must be some serious problem with config, code, or something else.

Could you link the code doing this?

Can the times be split into relevant components, Cypher, GraphQL, other code, time to get the first record, time to complete.

Are you using full-text indexing? Recent Neo4j?

KeywanHP commented 3 years ago

@marco-brandizi - most likely reason is that we have not created full-text indexes for our Neo4j db yet? Can you try to create the index and Emzar can rerun the code.

https://neo4j.com/docs/cypher-manual/current/indexes-for-full-text-search/

CREATE FULLTEXT INDEX [index_name] [IF NOT EXISTS]
FOR (n:LabelName[|...])
ON EACH "[" n.propertyName[, ...] "]"
[OPTIONS "{" option: value[, ...] "}"]
KeywanHP commented 3 years ago

I created an index like this (Neo4j 3.5 syntax):

CALL db.index.fulltext.createNodeIndex("titlesAndAbstracts",["Publication"],["Abstract", "AbstractHeader"])

Can search the index using Lucene syntax like this:

CALL db.index.fulltext.queryNodes("titlesAndAbstracts", "drought AND trehalose") YIELD node, score
RETURN node.prefName, score

CALL db.index.fulltext.queryNodes("titlesAndAbstracts", "drought OR trehalose") YIELD node, score
RETURN node.prefName, score

It's very fast (milliseconds). See here: image

@kideveloper612 - Can you try the same search via the neo4j-graphql library.