issues
search
bhavnicksm
/
chonkie
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
https://pypi.org/project/chonkie/
MIT License
1.54k
stars
57
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[BUG] SDPM & Semantic Chunking Example not working
#59
regstuff
closed
1 hour ago
2
[Fix] Add fix for #55
#58
bhavnicksm
closed
20 hours ago
0
[Fix] AutoEmbeddings not loading `all-minilm-l6-v2` but loads `All-MiniLM-L6-V2`
#57
bhavnicksm
closed
21 hours ago
0
Update DOCS.md - fixed embeddings path after recent change
#56
pratyushmittal
closed
21 hours ago
3
[BUG] Newlines are not removed after pre-processing in SemanticChunker
#55
Pringled
closed
6 hours ago
3
[Refactor] Optimize similarity calculation by using np.divide for imp…
#54
bhavnicksm
closed
2 days ago
0
[Fix] Refactor WordChunker, SentenceChunker pre-chunk splitting for reconstruction tests + minor changes
#53
bhavnicksm
closed
2 days ago
0
[Fix] Token counts from Tokenizers and Transformers adding special tokens
#52
bhavnicksm
closed
2 days ago
0
[fix] Reorganize optional dependencies in pyproject.toml: rename 'sem…
#51
bhavnicksm
closed
2 days ago
0
[DISC] Benchmarking Chonkie Mega-Thread
#50
bhavnicksm
opened
2 days ago
0
[FEAT] Add support for Model2VecEmbeddings + Switch default embeddings to Model2VecEmbeddings
#49
bhavnicksm
closed
2 days ago
0
Reconstruction Test
#48
mrmps
closed
2 days ago
3
[DOCS] Add info about initial embeddings support and how to add custom embeddings
#47
bhavnicksm
closed
3 days ago
0
Add initial OpenAIEmbeddings support to Chonkie ✨
#46
bhavnicksm
closed
3 days ago
0
Refactor BaseChunker, SemanticChunker and SDPMChunker to support BaseEmbeddings
#45
bhavnicksm
closed
4 days ago
0
[FEAT] Add SentenceTransformerEmbeddings, EmbeddingsRegistry and AutoEmbeddings provider support
#44
bhavnicksm
closed
4 days ago
0
[DISC] Improving Documentation
#43
bhavnicksm
opened
4 days ago
3
[BUG] Chunkers failing the test of recronstruction
#42
mrmps
closed
2 days ago
6
[FEAT] - Add model2vec embedding models
#41
sky-2002
closed
2 days ago
15
[FEAT] Min chunk size (for semantic chunkers)
#40
kbarendrecht
opened
5 days ago
1
[FEAT] Add async support to SDPMChunker and to SemanticChunker
#39
rodion-m
opened
5 days ago
7
[FEAT] Add an ability to use OpenAI / VoyageAI / Cohere embeddings with SDPMChunker via LiteLLM
#38
rodion-m
opened
5 days ago
5
[BUG] start_index and end_index inaccurate for repetitive text chunks
#37
bhavnicksm
opened
5 days ago
0
[FEAT] Allow configuring backend for Sentence_Transformers (e.g. ONNX, openVINO)
#36
kbarendrecht
closed
5 days ago
3
Bump version to 0.2.0.post1 in pyproject.toml and __init__.py
#35
bhavnicksm
closed
5 days ago
0
Use `__slots__` instead of `slots=True` for python3.9 support
#34
bhavnicksm
closed
5 days ago
0
[BUG] TypeError: dataclass() got an unexpected keyword argument 'slots'
#33
AgentT30
closed
5 days ago
2
Major Update: Fix bugs + Update docs + Add slots to dataclasses + update word & sentence splitting logic + minor changes
#32
bhavnicksm
closed
6 days ago
0
[BUG]pyo3_runtime.PanicException: no entry found for key
#31
wbbeyourself
closed
5 days ago
4
[DOCS] Fix typo for import tokenizer in quick start example
#30
jasonacox
closed
6 days ago
1
[BUG] Fix the start_index and end_index to point to character indices, not token indices
#29
mrmps
closed
1 week ago
2
Add initial batching support via `chunk_batch` fn + update DOCS
#28
bhavnicksm
closed
1 week ago
0
Update dependency version of SentenceTransformer to at least 2.3.0
#27
bhavnicksm
closed
1 week ago
0
[BUG]AttributeError: 'SentenceTransformer' object has no attribute 'similarity'
#26
heweapon
closed
1 week ago
6
ImportError: cannot import name 'tokenizer' from 'tokenizers' (/usr/local/lib/python3.10/site-packages/tokenizers/__init__.py)
#25
abchbx
closed
1 week ago
1
fix: tokenizer mismatch for `SemanticChunker` + Add BaseEmbeddings
#24
bhavnicksm
closed
1 week ago
0
Can I load offline tokenizers in it?
#23
a136214808
opened
1 week ago
2
Update README.md + minor updates
#22
bhavnicksm
closed
1 week ago
0
Remove Spacy dependency from 'sentence' install + Add FAQ to DOCS.md
#21
bhavnicksm
closed
1 week ago
0
Remove Spacy dependency from Chonkie
#20
bhavnicksm
closed
1 week ago
0
Add FastEmbed Support for Embedding Generation/Inference
#19
adithya-s-k
closed
1 day ago
5
`TokenChunker` does not support multiple inputs
#18
not-lain
closed
1 week ago
5
Update README.md + fix DOCS.md typo
#17
bhavnicksm
closed
1 week ago
0
Incorrect import in Docs, SDPMChunker reference
#16
Om-Alve
closed
1 week ago
1
Update acknowledgements in README.md for improved clarity and appreci…
#15
bhavnicksm
closed
2 weeks ago
0
Development
#14
bhavnicksm
closed
2 weeks ago
0
Run Black + Isort + beautify the code a bit
#13
bhavnicksm
closed
2 weeks ago
0
Make imports as a part of Chunker __init__ instead of file imports to make Chonkie import faster
#12
bhavnicksm
closed
2 weeks ago
0
Bump version to 0.1.1 in pyproject.toml and __init__.py
#11
bhavnicksm
closed
2 weeks ago
0
Update README.md
#10
bhavnicksm
closed
2 weeks ago
0
Next