Open filip-michalsky opened 1 month ago
the map would look like this:
1) on website load: scan the whole page by scrolling from top to bottom and search through chunks as we do know 2) at the same time we save an embedded representation of each element to a vector db. 3) once we scroll down through the chunks, we always also do a vector search on each step - which can increase our accuracy significantly 4) if we have processed all chunks and no relevant action was taken, we can try to retrieve relevant elements from the vector DB which might be relevant and try to act on those as the nearest proxy of the action we want.
It's sometimes hard to find the right elements when navigating the website in chunks - things can reload etc as scroll down.
What if we have a hashing/embedding algorithm for each URL where we store a very sparse/compact representation of an element. Hopefully, we can then query / RAG relevant candidates beyond the chunking window, which would allow us to "see" beyond the viewport and potentially find more things...