OriginProtocol / origin-bridge

We've moved to a monorepo: https://github.com/OriginProtocol/origin/tree/master/infra/bridge
MIT License
15 stars 7 forks source link

Design and stub out indexing service #4

Closed matthewliu closed 6 years ago

matthewliu commented 6 years ago

Come up with high-level indexing requests and responses.

Essentially needs to handle various types of search:

Search by keyword Search by recency, ratings, etc. Search by other filters (attributes on listings)

Should design this so that it is extensible in the future, but can be very minimal for now.

ambertch commented 6 years ago

flask-restless looks to have a query interface that maps to DB queries, although I don't know if the JSON datatype is supported by the library (edit Gagan says it is not, although could be implemented... I took a look at the library and it supports custom operators https://github.com/jfinkels/flask-restless/blob/master/tests/test_filtering.py#L1266). If the spec for the initial version is as above,

Postgres supports both full-text search and indexing on json fields, so it could be an option to use the DB to meet the initial requirements.

Benchmarking would definitively tell how performant (for an expected dataset size) fulltext search in postgres vs. say, elasticsearch/solr is.

franckc commented 6 years ago

I'm in the process of adding some scaffolding to the bridge server to support indexing the data into a search engine of our choice.

In terms of search engine, I'm leaning towards using a hosted ElasticSearch instance via a Heroku add-on such as Bonsai. ElasticSearch provides a lot of nice functionalities out of the box: schemaless, good ranking, filtering, faceting, sorting, support for all top languages (including asian languages), query suggestion, etc...

Using Postgres is certainly a possibility. The main advantages I see is that it does not add a new external dependency on the bridge-server and that it keeps all the data in Postgres as opposed to having to index it and keep it in sync with a search engine. But in terms of drawbacks, Postgres search has a more limited set of functionalities, will require us to think more carefully about our data model, and I expect it will be a bit more work to get it to do what we need. Also in the long-term (Millions of listings) I'm pretty sure we would outgrow Postgres search capabilities, performance and scalability.

Regardless of which option we go for, the search indexing and querying code should be modular so that we can easily swap the search engine if we decide to in the future.

franckc commented 6 years ago

Sent out a PR with an initial implementation for indexing events in Bonsai ElasticSearch.

Next step will be to add a Search API the DAPP can hit to search for listings.

franckc commented 6 years ago

Here is a proposal for the initial implementation of the Listings search API: https://github.com/OriginProtocol/bridge-server/pull/66

franckc commented 6 years ago

Relevant documents:

franckc commented 6 years ago

Closing this since design done.