Embed content on page load

do-me / SemanticFinder

SemanticFinder - frontend-only live semantic search with transformers.js

https://do-me.github.io/SemanticFinder/

MIT License

226 stars 16 forks source link

Embed content on page load #24

Closed lizozom closed 10 months ago

lizozom commented 1 year ago

Since we can offload the work to a separate thread, we could consider embedding the content once when it loads. Then the search operation becomes trivial and streaming the results is not necessary.

This would look something like this:

window.onload = async () => {
   await init(model);
   const content = getTextFromSomeMainEl();
   const splitText = splitText(content, strategy);
   const contentEmbedding = await embedContent(splitText);  
}

(if the user chooses to change the split strategy, we would re-run this logic.

Then when searching, this only becomes an issue of:

embedding the query
running cosine similarity on the embedding map

What do you think?

do-me commented 1 year ago

Then the search operation becomes trivial

In theory yes. In practice not necessarily. Consider large or very large use cases such as one of the earlier demos I created, indexing 85 pdf pages. The embeddings are already calculated and loaded but the search still takes a few seconds - in my opinion too long to drop the progress bar. Also, we should consider even larger use cases/documents where searching is not that trivial. Not sure whether my implementation back then could be accelerated somehow, but I still think it will take some seconds in any case.

What do you think?

lizozom commented 1 year ago

Ah, I totally agree on the progress bar, but what do you think about separating the embedding from the search process? I'll add callbacks to both embed and search apis.

But I thought about starting from splitting those two processes.

do-me commented 1 year ago

Logic-wise that would be good I guess. Let me just understand better:

Let's say we have a text split in 10 paragraphs. According to your new logic, would you still process it like this:

embed p1, calculate similarity, send to search, search rerank, progress bar update
embed p2, calculate similarity, send to search, search rerank, progress bar update
...
embed p10, calculate similarity, send to search, search rerank, progress bar update

Or is it one after the other, that you first embed everything and then do the search logic? This would have the negative effect that the user doesn't get any results/updates/reranks until everything is embedded, hence a worse user experience, especially for large documents.

lizozom commented 1 year ago

What I'm suggesting is that while the page loads, we load the model and run the embedding on the content. This would create a mapping (I called this type EmbeddingMap of texts to embedding vectors). In the current demo this will take ~2 seconds. Also, this embedding map can be stored in IndexDB, meaning it would only ever happen once for a given configuration.

Then once the user runs a search, the search would be faster (we can still show a progress bar, or even show a progress bar for loading then another one for searching).

do-me commented 1 year ago

I think the idea is charming to save a bit of time, but I most users (especially on mobile, but it's the same for desktop users) probably don't have super fast internet anyway so that the main time spent initially would be downloading the model. The experience would most likely be:

Current

slowly loading model (taking most of the time)
submit button available
user clicks
embedding calculation (for the current demo kind of fast)
search operation (for the current demo kind of fast)
...

Your idea

slowly loading model
embedding calculation for default settings (for the current demo kind of fast)
submit button available
user clicks
search operation (for the current demo kind of fast)
...

I think for the demo there wouldn't be a huge benefit as users most likely would tweak with the settings anyway or simply copy & paste their own text to actually get some value from the tool.

Thinking about it a second time however, any speedup is good, so why not? :)

do-me commented 1 year ago

By the way, if we wanted to we could also go one step further and just load a pre-indexed file for the demo. This would speed it up even further but feels a bit like cheating to me as the demo is supposed to show how it works and give an impression of the performance. Or am I too strict here? I think I'd still prefer your idea, actually calculating everything client-side.

do-me commented 10 months ago

I'll close this issue in favor of #44. Once the feature is implemented, we could allow for two versions:

The base app without loading any pre-indexed file
Some kind of semantic URL e.g. https://do-me.github.io/SemanticFinder/load/IPCC_report where you indicate that a certain pre-indexed file should be loaded.