-
### Description
The most crucial factor for HackerGPT is the quality of AI responses. To significantly improve the RAG system, we need to create custom code for text embedding and metadata extraction…
-
### System Info
- CPU: x86_64, Intel(R) Xeon(R) Platinum 8470
- CPU/Host memory size: 1TB
- GPU:
4xH100 96GB
- Libraries
TensorRT-LLM: main, 0.15.0 (commit: b7868dd1bd1186840e3755b97ea3d3a73dd…
-
While integrating a mega-based encoder ([BEE-spoke-data/mega-encoder-small-16k-v1](https://huggingface.co/BEE-spoke-data/mega-encoder-small-16k-v1)) with the sentence-transformers library, I've encoun…
-
Reasons for:
- 1 time investment. no more dealing with text stream overhead, only optimised operations.
- Respect the poly-indexability of our data. We can index with timestep, box-time or atom_id, …
-
- Do the terms we've created for the two text display modes make sense to users?
- Does page-by-page successfully capture the reading experience of reflowable text?
- Does two-column make sense fo…
-
Current KNN nested field works with max score mode which use max score among child documents(nested field document) as the parent document score. I would like to use other score mode like avg or sum o…
-
Use-Case:
A user has a small chunk of text and wants to find longer text that contain this chunk or a similar chunk.
Proposed solution draft:
Apply shift-invariant text-chunking (for example ~100…
-
**Describe the bug**
Currently, raw_sentences includes "\n" strings [here](https://github.com/bhavnicksm/chonkie/blob/main/src/chonkie/chunker/semantic.py#L159). This means that embeddings are create…
-
Hi,
I am using partition and chunk_by_title to chunk my pdfs. It generally works but when I investigated the chunks I saw that if there is a Table in one of my documents, the title of the table is …
-
Of all aspects challenging the readability of an argparse output for the 95% of us, or making people avoid reading too much, perhaps the density of the text is one of the worst sticking points. This i…