text-chunking Search Results

1000+ results
for text-chunking

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

explodinggradients/ragas #1098

Do we need to chunk documents before text set generation?

The embedding model is used for TestsetGenerator: ```py generator = TestsetGenerator.from_langchain(generator_llm, critic_llm, embedding_model) dataset = generator.generate_with_langchain_docs(docu…

hanfei1986 updated 4 months ago
3
JuliaData/CSV.jl #1143

CSV.jl fails to parse a file that DuckDB is fine with

MWE: ```julia import CSV, QuackIO using DataFrames file = download("https://raw.githubusercontent.com/newzealandpaul/Maritime-Pirate-Attacks/refs/heads/main/data/csv/pirate_attacks.csv") # tr…

asinghvi17 updated 1 month ago
1
elastic/elasticsearch #114909

[ML] ingest node crashes when running semantic_text inferenc…

### Version: 9.0.0 ### Build: ``` "build": { "hash": "8ccfb227c2131c859033f409ee37a87023fada62", "date": "2024-10-16T00:43:30.449150814Z" }, ``` ### Error: Node crashed with the error: …

wwang500 updated 1 month ago
2
typesense/typesense #1526

Handling docs >max tokens supported by a model for embedding…

I'm curious as to how TS supports generation of embeddings for documents with more tokens than the model can handle (most models support 512 only). Will it just truncate the doc (or does the model …

zehawki updated 4 months ago
8
postgresml/postgresml #735

Ability to query by max characters instead of top-k

I could see it being useful in cases where you have a mix of small/large chunks of text. If you set `top-k=3`, it may not return enough text to provide proper context. In such cases, you can request t…

aplchian updated 1 year ago
1
tensorlakeai/indexify #107

Add Data Transformers to Data Repository

Content is extracted when a developer binds an extractor to a data repository. As new content lands the extractors are applied on the content and the derived information is written to indexes. Ext…

diptanu updated 1 year ago
1
Cinnamon/kotaemon #460

[BUG] The accuracy of talking to a single document is very h…

### Description The accuracy of talking to a single document is very high, but when talking to two files, the accuracy is very low, but the information panel can display the most relevant content，I d…

sandbury updated 1 week ago
4
nickthecook/archyve #3

Extract images from PDF

Would be helpful to be able to treat images as separate documents, and search them based on descriptions or surrounding text from the PDF. These could be presented to the user along with the LLM respo…

nickthecook updated 1 month ago
1
Unstructured-IO/unstructured #3707

bug/Extract ppt failed by api

**Describe the bug** When I pulled the latest image of `downloads.unstructured.io/unstructured-io/unstructured-api` and extract a ppt file , by `partition_via_api` , it is failed, i have attached th…

JohnJyong updated 2 weeks ago
4
infiniflow/ragflow #568

[Feature Request]Using Ragflow for Document Preprocessing wi…

### Describe your problem Hi, i am currently working on a project where the way documents are segmented into chunks is crucial and varies depending on the specific task at hand. For example, in a …

JahnKhan updated 7 months ago
1

上一页 1...13 14 15 16 17 18 19...100 下一页

1000+ results for text-chunking

1000+ results
for text-chunking