Open sunilemanjee opened 6 months ago
Hey Sunile - Worth creating a search labs article on this. i would be hesitant that it should be something we advise and in future, we hope that semantic_text will have this option of expanding hit passages.
I agree on publishing an article. My goal is to publish the notebook and then shortly after an article.
WRT your hesitation on advising this approach, mind elaborating on this? Fetching surrounding chunks is a pattern for RAG applications.
Regards,
Sunile Manjee | Senior Principal Solutions Architect | 630-333-2433 Azure Solutions Architect Expert Certified, Accreditation verified here https://tinyurl.com/3mtzu33t Azure Fundamentals Certified, Accreditation verified here https://tinyurl.com/bdte7p7v Technical Enablement: @.*** https://tinyurl.com/ywxxt8te
On Wed, Jun 5, 2024 at 5:58 AM Joe McElroy @.***> wrote:
Hey Sunile - Worth creating a search labs article on this. i would be hesitant that it should be something we advise and in future, we hope that semantic_text will have this option of expanding hit passages.
— Reply to this email directly, view it on GitHub https://github.com/elastic/elasticsearch-labs/issues/257#issuecomment-2149505203, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2UJ4H6VZBLLSWHZYUHKVHLZF3VM7AVCNFSM6AAAAABIUXYB52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBZGUYDKMRQGM . You are receiving this because you authored the thread.Message ID: @.***>
Oh im fine with advising this approach given the current situation today. It makes sense to be in the supporting-blog-content folder however and search labs article giving visibility that they can do this.
When we have this baked into Elasticsearch, we can add a new example into notebooks which is a more formalised and supported way.
/supporting-blog-content
= Search labs article + notebook example
/notebooks
= product feature of elasticsearch thats well supported
I have built a notebook to demonstrate how to fetch surrounding chunks within Elasticsearch. This is not the only way to do it, but it is definitely a valid approach. I am interested in contributing this notebook to our examples
Notebook: Google Colab
The example uses text from a Harry Potter book, splitting it by chapter and then into chunks. Each chunk is a nested passage containing the text, dense, and sparse representations of the chunk.
When searching, the demo will fetch the matching passage chunk along with surrounding chunks. If the chunk is the first chunk in the chapter, it will fetch n, n+1, and n+2. If it is the last chunk, it will fetch n, n-1, and n-2. Otherwise it will fetch n-1, n, n+1.