-
Since OpenSearch 2.13, [**fixed token length algorithm**](https://opensearch.org/docs/latest/ingest-pipelines/processors/text-chunking/#fixed-token-length-algorithm) is available in text chunking proc…
-
after try step from readme
```
curl -X POST http://127.0.0.1:8080/v1/create/rag -F "file=@paris.txt"
```
It took 590824.84 ms = nearly 1 minute for only chunking 306 lines (91KB) file on m3 max.
…
-
### Describe the bug
When performing _bulk update request while using text chunking processor I am getting `{"took":0,"ingest_took":1,"errors":true,"items":[{"index":{"_index":null,"_id":null,"statu…
-
Hi All ,
I am new to this semantic router, i am using the below code
import os
from getpass import getpass
from semantic_router.encoders import OpenAIEncoder
os.environ["OPENAI_API_KEY"]…
-
**Describe the bug**
Sometimes when using chunking, the `text_as_html` for Table elements is missing some of the content compared to `text` property.
Reasoning:
- Text for a table can only come fro…
-
I went to try out Omniparse (looks great!) but when I went to upload my documents I was met with an error stating markdown documents aren't supported.
This really surprised me given most wikis,…
-
This important, but difficult to explain - ask me if I am not clear.
Two 'granularities' are relevant when chunking the ENB reports
1. The first and coarser level is that of 'sections' corresponding…
tommv updated
8 years ago
-
Today, if we ingest a large piece of text into a Knowledge base entry, only the first 512 word pieces are used for creating the embeddings that ELSER uses to match on during semantic search.
This m…
-
### Description
Team, great project thx. Just getting up to speed. I have been struggling with understanding the appropriate strategy for chunking for my data. While I have no doubt there is a plan t…
-
### What happened?
I'm creating an API with Flask. The other side will send me a file and I will save it to chroma database on my side. Chroma.add will terminates my program without any exception. Wh…