-
Jina.ai support a token limit of 8192 for generating the embeddings. For late chunking if my context is more than 8192, then what are the best strategies to implement late chunking?
-
```
import requests
import json
# URL-ul serverului Teprolin
url = "http://127.0.0.1:5000/process"
# Textul pe care dorim să îl procesăm
text = "Ion și-a cumpărat o mașină de tuns iarba de_l…
-
**Problem**
Currently when "by_title" chunking strategy is used and `coordinates = true` parameter is set (in order to return coordinates of the PDF chunks), coordinates are not returned (because in …
-
It looks like you are implementing what looks like the [Kubo](https://github.com/ipfs/kubo/) defaults, they are nearing 10 years and lack the newest features we support, I thus want to change thoses s…
-
I noticed that when the number of characters per line is very short in a list block in a Markdown document, the list is identified as a `Title` instead of a `NarrativeText`.
It prevents the chunkin…
-
**Is your feature request related to a problem? Please describe.**
Currently, when processing PDF documents using the chunk_by_title function from the Unstructured library, a Table element always f…
-
# Chunking: Proposals and Discussion
## What is Chunking?
(For more detail, see PDF page 143 of [The HyperTalk manual](https://cancel.fm/stuff/share/HyperCard_Script_Language_Guide_1.pdf))
The …
-
I am using the [read_delim_chunked](https://readr.tidyverse.org/reference/read_delim_chunked.html) function to process large text files chunk-by-chunk. My expectation is that memory is cleared after e…
-
I am trying to build chatbot based on FAQ documentation. It uses text file as a list of question-answer pairs.
However, base chunking strategy sometimes splits chunks in the middle of an answer or be…
-
Type of issue: Bug?
Uploader type: traditional
Fine Uploader version: 5.5.1
Browsers where the bug is reproducible: All
Operating systems where the bug is reproducible: Windows 7
**Steps to reproduce…