-
The embedding model is used for TestsetGenerator:
```py
generator = TestsetGenerator.from_langchain(generator_llm, critic_llm, embedding_model)
dataset = generator.generate_with_langchain_docs(docu…
-
MWE:
```julia
import CSV, QuackIO
using DataFrames
file = download("https://raw.githubusercontent.com/newzealandpaul/Maritime-Pirate-Attacks/refs/heads/main/data/csv/pirate_attacks.csv")
# tr…
-
### Version:
9.0.0
### Build:
```
"build": {
"hash": "8ccfb227c2131c859033f409ee37a87023fada62",
"date": "2024-10-16T00:43:30.449150814Z"
},
```
### Error:
Node crashed with the error: …
-
I'm curious as to how TS supports generation of embeddings for documents with more tokens than the model can handle (most models support 512 only).
Will it just truncate the doc (or does the model …
-
I could see it being useful in cases where you have a mix of small/large chunks of text. If you set `top-k=3`, it may not return enough text to provide proper context. In such cases, you can request t…
-
Content is extracted when a developer binds an extractor to a data repository. As new content lands the extractors are applied on the content and the derived information is written to indexes.
Ext…
-
### Description
The accuracy of talking to a single document is very high, but when talking to two files, the accuracy is very low, but the information panel can display the most relevant content,I d…
-
Would be helpful to be able to treat images as separate documents, and search them based on descriptions or surrounding text from the PDF. These could be presented to the user along with the LLM respo…
-
**Describe the bug**
When I pulled the latest image of `downloads.unstructured.io/unstructured-io/unstructured-api` and extract a ppt file , by `partition_via_api` , it is failed, i have attached th…
-
### Describe your problem
Hi,
i am currently working on a project where the way documents are segmented into chunks is crucial and varies depending on the specific task at hand. For example, in a …