cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.97k stars 3.79k forks source link

sql: Full Text Search #7821

Open Linicks opened 8 years ago

Linicks commented 8 years ago

All, I'm sure most of you know about Bleve (https://github.com/blevesearch/bleve) a Go-lang based full-text indexer. I was wondering if you've considered integrating it with CockroachDB? I'ts seems like it may be a good fit, and is being used in other distributed databases.

Thanks! -- Nick

Maintainer note from @jordanlewis: see the following issues for our current progress on search

gz#6861

Jira issue: CRDB-6169

petermattis commented 8 years ago

@Linicks Full-text search is something we'd like to support and Bleve is on my radar, though there are no concrete plans to integrate it.

alexander-manley commented 8 years ago

One approach for integrating Bleve with Cockroach, and thus provide CockroachDB with text search, would be to modify hugoidx (https://github.com/blevesearch/hugoidx) to allow it to BLEVE-index the contents of a Cockroach BLOB store (...https://github.com/cockroachdb/cockroach/issues/243) pre-populated with corpus text (web page scrapes, text-doc-dumps etc...).

In addition to hugoidx, the associated Go utility "bleve-hosted" could be wrapped into the embedded UI (https://github.com/cockroachdb/cockroach/tree/master/ui) in order to pull-out and/or highlight text search results pulled from the BLOB store and displayed as an additional panel under the left side "DATABASES" UI tab.

Bleve is based on file indexes, which by default are stored in BoltDB, so that part would need to be ported over to RocksDB for full integration. For the curious, a Bleve benchmark graph with RocksDB was posted to the Bleve Twitter stream a while back.

Reference: http://www.blevesearch.com/news/Site-Search/ http://www.blevesearch.com/videos/

petermattis commented 8 years ago

@alexander-manley Thanks for the notes. We'll definitely take a closer look at Bleve when considering full-text indexing.

randyyaj commented 7 years ago

Any updates on this?

petermattis commented 7 years ago

@randyyaj Full-text indexing is something we'd like to do, but still a ways off and not currently scheduled.

SantoshSah commented 6 years ago

@petermattis , any update?

nstewart commented 6 years ago

Full text search is something we want to support, but it is not on the roadmap for cockroachdb 2.1 or 2.2. While we are adding some new functionality, for the next couple releases we are focusing on improving the performance and stability of our current offering before we add major new features.

RoachietheSupportRoach commented 5 years ago

Zendesk ticket #3521 has been linked to this issue.

OldhamMade commented 5 years ago

Does this zendesk ticket mean that full-text indexing is being actively worked on?

jordanlewis commented 5 years ago

No, full text search isn't on the near term roadmap for the time being.

aranwe commented 2 years ago

No, full text search isn't on the near term roadmap for the time being.

2 years later, any plans? :)

alexander-manley commented 2 years ago

In the meantime... https://opensearch.org/

On Mon, Oct 25, 2021, 9:25 AM 4RW @.***> wrote:

No, full text search isn't on the near term roadmap for the time being. … <#m-2633625202395074411>

2 years later, any plans? :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cockroachdb/cockroach/issues/7821#issuecomment-950927477, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF6LZWEQC7YTLVO2LXUFTTUIVLELANCNFSM4CJPWD7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Bessonov commented 2 years ago

@alexander-manley

In the meantime... https://opensearch.org/

You mean this aws and other cloud provider guys who stole technology to make huge money with it? Yeah, great effort on piracy. Full disclose: I'm elastic free and on-premise user. Not affiliated in any way with elastic and sorry to see how people steal, just because it's software and not hardware.

Back to the issue.

Although it would be nice to have a full text search (fts), I don't think that it's the right way. I never saw a good built-in search, because it's very complex, very special and there are great products like elasticsearch, solr, sphinxsearch and so on, which are developed for more than 15 years. It is a huge effort.

Instead of developing a very limited fts I would propose to develop an interface to popular products. Something like zombodb (not used yet). So you can interact with the search through SQL and your data (automagically) synced with index.

The first post suggest an integration with bleve. From the first glance it would be OK, but I'm not sure how big is the gap to other products. One show stopper is synonyms.

jezell commented 2 years ago

I think the best way to get some sort of support for full text search "out of the box" would be to CREATE CHANGEFEED to support some destinations like elasticsearch, vespa, algolia, etc. Modern full text search is a completely different domain than relational data. While I'm sure the team could eventually crack it, it would likely be a long road to get it up to par with something like Vespa. I'd personally rather see out of the box integration, as we wouldn't want to give up search result quality to switch to something built in.

jordanlewis commented 2 years ago

CockroachDB 22.2 will support trigram indexes, a simple form of text search that may help some of your use cases. See #79705 for details on what has been added.

amirouche commented 2 years ago

Since I only used it for spell checking for small dictionaries, I am not sure how trigrams help to implement full-text search.

jordanlewis commented 2 years ago

Please feel free to follow and upvote #41288, which is an issue that tracks Postgres-compatible tsvector and tsquery implementations.