Add fulltext Lucene analyzers for faster search in text data

covidgraph / motherlode

Pipeline for running all dataloader scripts for covidgraph in a controlled manner.

https://covidgraph.org

MIT License

3 stars 1 forks source link

Add fulltext Lucene analyzers for faster search in text data #15

Open jarasch opened 3 years ago

jarasch commented 3 years ago

We are preparing Cypher queries for users that want to query data either via Cypher (Neo4j-Browser) or Neo4j-Bloom.

Therefore we need to build text analyzers on the text properties on the following lables/properties:

Fragment.text
Paper.title
GeneSymbol.sid
Gene.name
Protein.name
PatentClaim.text
PatentTitle.text
PatentAbstract.text
Entity.name

jarasch commented 3 years ago

CALL db.index.fulltext.createNodeIndex("textOfPapersAndPatents",["Fragment", "Abstract", "Paper", "Patent", "PatentTitle", "PatentClaim","PatentAbstract"],["title", "text"])

jarasch commented 3 years ago

// Fulltext index on GeneSymbol where the gene name is stored in property sid
CALL db.index.fulltext.createNodeIndex("GeneSymbolFullTextIndex",["GeneSymbol"],["sid"])

jarasch commented 3 years ago

// Fulltext index on author names CALL db.index.fulltext.createNodeIndex("AuthorFullTextIndex",["Author"],["first", "middle","last"])

jarasch commented 3 years ago

// Fulltext index on entity names like company names CALL db.index.fulltext.createNodeIndex("EntityFullTextIndex",["Entity"],["name"])

motey commented 3 years ago

A dedicated loader to create all needed text indexes would make sense. this loader can be mounted into the motherlode pipeline.

Can one create text indexes on nodes that are not existing yet?

if yes we can collect all text indexes (including these from other loaders that are allready existing) at one place and create them at the beginning of the pipeline.

motey commented 3 years ago

https://github.com/covidgraph/graph-processing_fulltext-indexes

Will run this against DEV today

motey commented 3 years ago

Indexes are on DEV and PRD