-
Skriv enhetstester for dhlab-kode, både for å sikre at forventet funksjonalitet ikke endrer seg selv om implementasjonen gjør det (regresjonstesting) og at funksjonene faktisk oppfører seg slik vi øns…
-
### System Info
- CPU architecture: x86_64
- CPU/Host memory size (if known): 40G
- GPU name :RTX 3060-6G
- TensorRT-LLM branch or tag TensorRT-LLM commit: TensorRT-LLM version: 0.14.0.dev2024092400
…
-
**Motivation:**
Right now, the chunking uses a greedy algorithm. The following would output the following chunks:
```rust
let text = "Sentence 1. Sentence 2. Sentence 3. Sentence 4.";
splitter.chunk…
-
Would be helpful to be able to treat images as separate documents, and search them based on descriptions or surrounding text from the PDF. These could be presented to the user along with the LLM respo…
-
Reasons for:
- 1 time investment. no more dealing with text stream overhead, only optimised operations.
- Respect the poly-indexability of our data. We can index with timestep, box-time or atom_id, …
-
```
[](https://localhost:8080/#) in extract_data_from_pdf(pdf_path)
57 # Function to extract text using the unstructured library
58 def extract_data_from_pdf(pdf_path):
---> 59 eleme…
-
I recommend a more advanced chunking system. You ideally want to break text up by sentence or paragraph where possible. chunking by words will split sentences and break the meaning of those sentences.…
-
### 🐛 Describe the bug
I am trying to add the same text to the app. But the logs are not clear.
I checked the number of chunks, its one only. But the logs are not clear and it seems like the proces…
-
Of all aspects challenging the readability of an argparse output for the 95% of us, or making people avoid reading too much, perhaps the density of the text is one of the worst sticking points. This i…
-
I am running `ragbuilder` on MacOS 14.6.1, built from the Docker image and run as a container. I have followed the 'Getting started' example using the provided blogpost and also a small set of selecte…