ang-zeyu / infisearch

Easy and flexible client-side search for static sites
https://infi-search.com
MIT License
41 stars 1 forks source link

Marketing ideas #1

Open rrjanbiah opened 2 years ago

rrjanbiah commented 2 years ago

Apologies for the unsolicited marketing ideas. SCR... I was searching for this billion dollar library for a long time (https://github.com/meilisearch/meilisearch-rust/issues/67). Sorry to see the project is not getting enough attention it deserves. I think, it is mostly due to the keywords, SEO and the pitch.

Here are my 2 cents:

  1. Compare the keywords, pitches, etc with the competitors (Typesense, Algolia, ElasticSearch, Meilisearch, Site Search 360, Klevu)
  2. For example, look at the title, description and topics used by the new and successful player in this space https://github.com/typesense/typesense
  3. Create a new website. Again compare with https://typesense.org/
  4. Create head-to-head comparison with the competitors. Again, compare with https://typesense.org/typesense-vs-algolia-vs-elasticsearch-vs-meilisearch/
  5. Remove mdBook related pitch from the main README. They should be sub-products
  6. Remove limitations from the README. Perhaps move them to subpages
  7. Fix the demo to accept 1 characters. Currently "No results" for less than 4 characters search is confusing. If that's not possible, need to display "Type 4 or more characters to get results"
  8. Create a demo using https://github.com/algolia/instantsearch.js (See, how MeiliSearch has done it https://github.com/meilisearch/instant-meilisearch)
  9. Fork and add comparison in https://mosuka.github.io/search-benchmark-game/
  10. Create a server-based and CLI-based search options, like Algolia, etc (I think, that should be easy as you used Rust). It can be Enterprise-only too.
  11. Possibly rename the project to include "search" in the product name (?)
  12. Publish the crate
  13. Think about moving few parts to Enterprise (open core approach) with some sort of licensing, for making it a commercial success.

Note: I can contribute now and then for 1-6 and bit on 9. My advance wishes for your success.

ang-zeyu commented 2 years ago

Thanks for the suggestions! @rrjanbiah

This my first published open source project from scratch and I have quite a bit to pickup in the art of marketing :sweat_smile:, really appreciate the ideas!

I won't have much time till after the 7th (final exams πŸ“š) to work on this, but if you are able I would like to clarify some of the points with you first:

Fix the demo to accept 1 characters. Currently "No results" for less than 4 characters search is confusing. If that's not possible, need to display "Type 4 or more characters to get results"

Is this the demo here? I tried all letters of the alphabet but couldn't reproduce this. I do however filter/delete some punctuation characters (yet to implement an option to configure this as well), so that could be the issue. If you could let me know what was the query you used 😁

Create a server-based and CLI-based search options, like Algolia, etc (I think, that should be easy as you used Rust). It can be Enterprise-only too.

Publish the crate

I thought about whether I wanted to support this (server) when I first started, eventually decided to focus on my main use case -- which is an attempt / in view of issues like this

, in order to facilitate pre-built index client side search on much larger static sites without having to setup a search server.

There are a few other reasons (at least, I think) that adding a server here wouldn't be quite value-adding:

The closest analogue for this project (in terms of the technical stack used) might be this https://github.com/jameslittle230/stork, except with more bells & whistles, and index splitting (for larger static sites).

Let me know what you think!

CLI-based search options

Didn't quite get this part though, is this referring instead to API-based? (I'm thinking something along the lines of a rust / nodejs wasm api)

Possibly rename the project to include "search" in the product name (?)

Indeed, not a very unique / seo-friendly name πŸ˜…. If I could get your input, are you thinking of something like this?:

ang-zeyu commented 2 years ago

Just an additional sidenote, and in case, based on your comment here https://github.com/meilisearch/meilisearch-rust/issues/67#issuecomment-756319783, morsels isn't quite designed for full client side indexing + searching, which is a space I think is well-filled by those libraries you've linked. Rather, the pre-built index must be generated from a cli build tool, much like https://github.com/jameslittle230/stork.

I considered making an "optional" client side indexer module as well, but this is quite difficult in view of the goal of supporting larger static sites, as we are fundamentally still constrained by data transfer / bandwidth issues with the source documents to-be-indexed. (in which case, may be better to just use lunr.js, etc. for smaller sites).

The tantivy analogue to morsels might be this really cool POC PR https://github.com/quickwit-oss/tantivy/pull/1067 I came to be aware of with the issue you linked. (the biggest difference perhaps would use of http RANGE requests)

rrjanbiah commented 2 years ago

@ang-zeyu Thank you so much for your detailed reply. You please focus your studies. I'll try to contribute now and then.

Is this the demo here?

Please check https://ang-zeyu.github.io/morsels/

rrjanbiah commented 2 years ago

The closest analogue for this project (in terms of the technical stack used) might be this https://github.com/jameslittle230/stork, except with more bells & whistles, and index splitting (for larger static sites).

I forgot about Stork though I bookmarked it. Though it has some traction, it is missing some pitches and focus keywords. Do you mean to say that we can prepare a comparison against Stork, but not Algolia, etc?

rrjanbiah commented 2 years ago

CLI-based search options

Didn't quite get this part though, is this referring instead to API-based? (I'm thinking something along the lines of a rust / nodejs wasm api)

Please ignore that for now. I meant, local file search, an alternative to grep -rnw . -e "search". But, that can be ignored till others are ready.

rrjanbiah commented 2 years ago

If I could get your input, are you thinking of something like this?:

morsels-search morsels.search

If I have to choose along these, I'd choose MorselsSearch to be an unique brand name. But, "morsels" is too humble in terms of selling. Think something along its pros... say, ScaleSearch (already used by others though), etc

rrjanbiah commented 2 years ago

I considered making an "optional" client side indexer module as well, but this is quite difficult in view of the goal of supporting larger static sites, as we are fundamentally still constrained by data transfer / bandwidth issues with the source documents to-be-indexed. (in which case, may be better to just use lunr.js, etc. for smaller sites).

In terms of marketing and product perspective, client side indexer should be default :-) For performance, we can say, buy/use static indexing. So, by default, it should work for most of small sites without preindexing.

Also, I have noted one thing.. when trying to search, it is taking more time for "initializing". Can't these be moved to a service worker and can't be made to initialize during page load itself?

rrjanbiah commented 2 years ago

The tantivy analogue to morsels might be this really cool POC PR https://github.com/quickwit-oss/tantivy/pull/1067 I came to be aware of with the issue you linked. (the biggest difference perhaps would use of http RANGE requests)

IMHO, that's not of high priority

ang-zeyu commented 2 years ago

Sorry for the late update!

I forgot about Stork though I bookmarked it. Though it has some traction, it is missing some pitches and focus keywords. Do you mean to say that we can prepare a comparison against Stork, but not Algolia, etc?

Nope, ditto that comparing against more popular tools (algolia) would be better from a marketing perspective as well. Mentioned Stork just for clarification of what morsels is.

In terms of marketing and product perspective, client side indexer should be default :-) For performance, we can say, buy/use static indexing. So, by default, it should work for most of small sites without preindexing.

I see... I think I have a better idea of what you're looking for now (correct me if I'm wrong): essentially, lunr.js but written using rust/C/C++/etc. + wasm? Agree supporting this would definitely be nice for flexibility, and simplicity (of the client side indexer default), and might bring some performance benefits too. So far I haven't run into a tool that does this as well.

Apart from the earlier point on current js tools (lunr.js / fuse / ...), my secondary reservation is simply that it is not a trivial change in the implementation unfortunately (I don't think I have the time to attempt and maintain it, at least anytime soon πŸ˜“). I'll take a jab at it but this is likely to come somewhat later.

For now at least, this is what I'm aiming for:

The closest analogue for this project (in terms of the technical stack used) might be this https://github.com/jameslittle230/stork, except with more bells & whistles, and index splitting (for larger static sites).

and improvements like these (+ benchmarks, marketing):

Also, I have noted one thing.. when trying to search, it is taking more time for "initializing". Can't these be moved to a service worker and can't be made to initialize during page load itself?

rrjanbiah commented 2 years ago

I'm sorry, I became busy lately and couldn't contribute as I wished & hoped.

I see... I think I have a better idea of what you're looking for now (correct me if I'm wrong): essentially, lunr.js but written using rust/C/C++/etc. + wasm?

My wishlist is:

  1. Fault tolerant search like Algolia, MeiliSearch
  2. Facets
  3. WASM

Recently, I also came across https://github.com/itemsapi/itemsjs But, it seems to miss fault tolerance and keyword highlighting in Algolia.

ang-zeyu commented 2 years ago

@rrjanbiah Thanks for clarifying! I see, facets are definitely nice to have but not quite an immediate goal for now, as I imagine it would be more useful for a general purpose search library (along with client-side indexing functionalities). Main target use case is static sites (documentation, blogs, ...) / static site generators.

A little off topic, but is there a specific reason for wasm? πŸ‘€ (other than possible performance gains)

Just speculation too, but I realised it may generally be difficult to reap any significant performance gains when using wasm + client-side indexing. Reason being that you'll need to transfer large amounts source documents to-fro the wasm boundary constantly (and then reparse, decode them, ...), the overhead is likely to be quite significant. Could be why such a library dosen't yet exist.

Fwiw there's also https://github.com/tinysearch/tinysearch

rrjanbiah commented 2 years ago

@ang-zeyu Thanks for your reply. tinysearch has WASM, but lacks all other features. My wishlist is based on early searchkit (they broke it after adding GraphQL); I vaguely remember that it had tolerant search as in Algolia, keyword highlight, instant search (no need to click submit), correct facets listing (unlike other implementations where the facets will be removed on count 0; but the correct implementation is to show it with 0 next to it)

searchkit is powered by Elasticsearch and so for every key in, it has to fetch from the server. I think, it may be ideal for more number of records that runs through different pages. But, for say 100-200 records, this could very well be optimized in the client side (?, or at least that I hope for). Yes, WASM is not of high priority; but a good selling point.

ang-zeyu commented 2 years ago

@rrjanbiah Thanks for the explanation on searchkit! Client side optimizations for small records sounds interesting, though I'm really not qualified to comment on the feasibility over there.

Create a server-based and CLI-based search options, like Algolia, etc (I think, that should be easy as you used Rust). It can be Enterprise-only too.

Just in case, if this is also one of the things in your wishlist, server support in morsels is not a goal. The intention / scope here is only to provide a free alternative to (mainly) static site search using a pre-built index (and maybe client side indexing in the future).

Fwiw my rough plan for now before I get to marketing:

rrjanbiah commented 2 years ago

@ang-zeyu I'm not really qualified to comment on the technical prioritization of the task list. However, I believe that below order may be better for you... as these can only be done by you:

  1. service workers for caching, configuration presets
  2. demo using https://github.com/algolia/instantsearch.js (See, how MeiliSearch has done it https://github.com/meilisearch/instant-meilisearch)
  3. demo using https://github.com/itemsapi/itemsjs (they already support MiniSearch & Lunr https://github.com/itemsapi/itemsjs#integrations )
  4. Benchmarks with other libs https://mosuka.github.io/search-benchmark-game/ & https://github.com/nextapps-de/flexsearch#performance-benchmark-ranking

But, feel free to work in your preferred order as you know it better than myself.

theowenyoung commented 1 year ago

Thanks for the great tool! I really like this idea.

I have a big static site, and I tried to use morsels to build an index, it's about ~30MB, it split to 2 files 25MB, 5MB, I agreed with @rrjanbiah that if there is an edge option is a good idea. there is another project https://github.com/wilsonzlin/edgesearch is for this.

I think if this can be hosted at Cloudflare workers, it will be a great successor.

ang-zeyu commented 1 year ago

Hi @theowenyoung, thanks for the link! It's really interesting to see use of cloudflare workers for this. πŸŽ‰

I think this is definitely feasible, a lot of Morsels' code is already setup to split index files, similar to edgesearch. What's missing would be alternate code paths to retrieve index files and field stores from Cloudflare, a few APIs, and providing easy ways to upload the index files to cloudflare.

I would love to get to exploring this sometime, if the performance improvements are substantial it would help to increase this tool's scalability even further. πŸ™‚