Closed lizozom closed 10 months ago
Then the search operation becomes trivial
In theory yes. In practice not necessarily. Consider large or very large use cases such as one of the earlier demos I created, indexing 85 pdf pages. The embeddings are already calculated and loaded but the search still takes a few seconds - in my opinion too long to drop the progress bar. Also, we should consider even larger use cases/documents where searching is not that trivial. Not sure whether my implementation back then could be accelerated somehow, but I still think it will take some seconds in any case.
What do you think?
Ah, I totally agree on the progress bar, but what do you think about separating the embedding from the search process? I'll add callbacks to both embed and search apis.
But I thought about starting from splitting those two processes.
Logic-wise that would be good I guess. Let me just understand better:
Let's say we have a text split in 10 paragraphs. According to your new logic, would you still process it like this:
Or is it one after the other, that you first embed everything and then do the search logic? This would have the negative effect that the user doesn't get any results/updates/reranks until everything is embedded, hence a worse user experience, especially for large documents.
What I'm suggesting is that while the page loads, we load the model and run the embedding on the content.
This would create a mapping (I called this type EmbeddingMap
of texts to embedding vectors).
In the current demo this will take ~2 seconds. Also, this embedding map can be stored in IndexDB, meaning it would only ever happen once for a given configuration.
Then once the user runs a search, the search would be faster (we can still show a progress bar, or even show a progress bar for loading then another one for searching).
I think the idea is charming to save a bit of time, but I most users (especially on mobile, but it's the same for desktop users) probably don't have super fast internet anyway so that the main time spent initially would be downloading the model. The experience would most likely be:
Current
Your idea
I think for the demo there wouldn't be a huge benefit as users most likely would tweak with the settings anyway or simply copy & paste their own text to actually get some value from the tool.
Thinking about it a second time however, any speedup is good, so why not? :)
By the way, if we wanted to we could also go one step further and just load a pre-indexed file for the demo. This would speed it up even further but feels a bit like cheating to me as the demo is supposed to show how it works and give an impression of the performance. Or am I too strict here? I think I'd still prefer your idea, actually calculating everything client-side.
I'll close this issue in favor of #44. Once the feature is implemented, we could allow for two versions:
Since we can offload the work to a separate thread, we could consider embedding the content once when it loads. Then the search operation becomes trivial and streaming the results is not necessary.
This would look something like this:
(if the user chooses to change the split strategy, we would re-run this logic.
Then when searching, this only becomes an issue of:
What do you think?