issues
search
dleemiller
/
WordLlama
Things you can do with the token embeddings of an LLM
MIT License
1.39k
stars
47
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
query autocomplete through beam decoding
#43
VoVAllen
opened
2 weeks ago
0
Integrate cache changes
#42
dleemiller
closed
2 weeks ago
1
Use custom cache dir for tokenizer download, too
#41
erickpeirson
closed
2 weeks ago
2
Fix error with `tokenizer.model_dump()` on vanilla install.
#40
rapatel0
opened
1 month ago
1
Numpy bitwise count
#39
dleemiller
closed
1 month ago
1
Something is wrong in versions accessible by PIP
#38
tumikosha
closed
4 weeks ago
2
How to extract the Token Embedding
#37
biniyoni
closed
2 weeks ago
4
inference.py: add batch_size argument to rank()
#36
jgbarah
closed
1 month ago
0
Return idx deduplicate
#35
dleemiller
closed
1 month ago
0
Vector similarity dedupe refactor
#34
dleemiller
closed
1 month ago
0
Allow WordLlama.rank to not sort the results
#33
jgbarah
closed
1 month ago
1
Matryoshka Representations Evaluation
#32
KyleSmith19091
closed
1 month ago
2
shape check before equality comparison
#31
dleemiller
closed
1 month ago
0
Doubts about utility to multilingual models
#30
TheMrguiller
opened
1 month ago
4
Word Splitting
#29
chapmanjacobd
closed
1 month ago
2
Cleanup prerelease
#28
dleemiller
closed
1 month ago
0
inference benchmarks
#27
dleemiller
closed
1 month ago
0
Feature/semantic splitter
#26
dleemiller
closed
1 month ago
0
A example of using WordLlama for a RAG pipeline
#25
dinhanhx
closed
1 month ago
4
Linting
#24
dleemiller
closed
2 months ago
0
fix some import errs
#23
jimexist
closed
2 months ago
0
Kmeans optimization
#22
dleemiller
closed
2 months ago
0
The example does not work
#21
tumikosha
closed
2 months ago
1
How do you really create WordLlama model?
#20
dinhanhx
opened
2 months ago
3
Feature / Add Semantic Splitting
#19
dleemiller
closed
1 month ago
3
Fix/memory consumption embed
#18
dleemiller
closed
2 months ago
0
wl.embed, wl.cluster high RAM usage
#17
chapmanjacobd
closed
2 months ago
8
tokenizer = Tokenizer.from_file(str(tokenizer_path)) Exception: data did not match any variant of untagged enum PyNormalizerTypeWrapper at line 49 column 3
#16
gfkdliucheng
closed
1 month ago
2
Feature/extraction tutorial
#15
dleemiller
closed
2 months ago
0
Need detailed example on how to extract the embedding model from LLM
#14
harshitv804
closed
2 months ago
3
ModuleNotFoundError: No module named 'wordllama.algorithms.kmeans_helpers'
#13
chapmanjacobd
closed
2 months ago
2
Gradio Demo
#12
amrrs
closed
2 months ago
4
fixing float<>double changes in cython, adding function test
#11
dleemiller
closed
2 months ago
0
Fedora Linux: Illegal instruction (core dumped)
#10
russellballestrini
closed
1 month ago
7
First README example fails
#9
cpa
closed
2 months ago
3
Feature/add l3 supercat models
#8
dleemiller
closed
3 months ago
0
changing to 64 bit ints and 32 bit floats
#7
dleemiller
closed
3 months ago
0
Feature/cython extensions
#6
dleemiller
closed
4 months ago
1
Feature/clustering
#5
dleemiller
closed
4 months ago
0
Feature/add topk search
#4
dleemiller
closed
4 months ago
0
add fuzzy deduplication algorithm
#3
dleemiller
closed
4 months ago
0
Feature/add hf downloader
#2
dleemiller
closed
4 months ago
0
Add llama2 supercat model
#1
dleemiller
closed
4 months ago
0