-
## Problem
Currently, our summarizer API doesn't handle large documents efficiently. When the input text exceeds the model's context window, the API fails to process. Users need to manually split lar…
-
## Description
Chunking is the process of breaking down large pieces of text into smaller chunks. For the purpose of this document chunking occurs at ingest for the use of embedding models. The reran…
-
Let's implement chunking in the same way it was done in LLM2 to allow summarizing texts that are longer than the model context size.
* Implement chunking (maybe have the chunking logic in a service s…
-
I would like to add custom metadata to chunks when saved to pinecone with Pipeline.from_configs.
Following the 'Custom meta data extraction ...' notebook on [this page](https://docs.unstructured.io…
-
Big files seem to reach the 8k context of text-embedding-3-small using Open AI. There should probably be a chunking/retrieval strategy implementation for this.
-
## **Ideas for General Settings**
### **1. Application Title**
- **Description**: Modify the title of the application that appears in the browser tab, login page, and header.
- **Implementation**: P…
-
## **Ideas for General Settings**
### **1. Application Title**
- **Description**: Modify the title of the application that appears in the browser tab, login page, and header.
- **Implementation…
-
I am using **unstructured-ingest** _version **0.3.0**_, using the following code:
```python
from unstructured_ingest.v2.interfaces import ProcessorConfig
from unstructured_ingest.v2.pipeline.pi…
-
Inspiration: 5-levels of chunking
[https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb](https://github.com/FullSt…
-
1. The crawling is often incomplete -- stories at later of the webpage will likely being ignored.
Consider segmenting (chunking) text snap shot before passing to GPT.
- decide which chunk size w…