issues
search
dleemiller
/
WordLlama
Things you can do with the token embeddings of an LLM
MIT License
1.31k
stars
42
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Use custom cache dir for tokenizer download, too
#41
erickpeirson
closed
1 day ago
1
Fix error with `tokenizer.model_dump()` on vanilla install.
#40
rapatel0
opened
1 week ago
1
Numpy bitwise count
#39
dleemiller
closed
1 week ago
1
Something is wrong in versions accessible by PIP
#38
tumikosha
closed
1 week ago
2
How to extract the Token Embedding
#37
biniyoni
closed
1 day ago
4
inference.py: add batch_size argument to rank()
#36
jgbarah
closed
3 weeks ago
0
Return idx deduplicate
#35
dleemiller
closed
3 weeks ago
0
Vector similarity dedupe refactor
#34
dleemiller
closed
3 weeks ago
0
Allow WordLlama.rank to not sort the results
#33
jgbarah
closed
3 weeks ago
1
Matryoshka Representations Evaluation
#32
KyleSmith19091
closed
1 month ago
2
shape check before equality comparison
#31
dleemiller
closed
1 month ago
0
Doubts about utility to multilingual models
#30
TheMrguiller
opened
1 month ago
4
Word Splitting
#29
chapmanjacobd
closed
3 weeks ago
2
Cleanup prerelease
#28
dleemiller
closed
1 month ago
0
inference benchmarks
#27
dleemiller
closed
1 month ago
0
Feature/semantic splitter
#26
dleemiller
closed
1 month ago
0
A example of using WordLlama for a RAG pipeline
#25
dinhanhx
closed
1 month ago
4
Linting
#24
dleemiller
closed
1 month ago
0
fix some import errs
#23
jimexist
closed
1 month ago
0
Kmeans optimization
#22
dleemiller
closed
1 month ago
0
The example does not work
#21
tumikosha
closed
1 month ago
1
How do you really create WordLlama model?
#20
dinhanhx
opened
1 month ago
3
Feature / Add Semantic Splitting
#19
dleemiller
closed
1 month ago
3
Fix/memory consumption embed
#18
dleemiller
closed
1 month ago
0
wl.embed, wl.cluster high RAM usage
#17
chapmanjacobd
closed
1 month ago
8
tokenizer = Tokenizer.from_file(str(tokenizer_path)) Exception: data did not match any variant of untagged enum PyNormalizerTypeWrapper at line 49 column 3
#16
gfkdliucheng
closed
1 month ago
2
Feature/extraction tutorial
#15
dleemiller
closed
1 month ago
0
Need detailed example on how to extract the embedding model from LLM
#14
harshitv804
closed
1 month ago
3
ModuleNotFoundError: No module named 'wordllama.algorithms.kmeans_helpers'
#13
chapmanjacobd
closed
1 month ago
2
Gradio Demo
#12
amrrs
closed
1 month ago
4
fixing float<>double changes in cython, adding function test
#11
dleemiller
closed
1 month ago
0
Fedora Linux: Illegal instruction (core dumped)
#10
russellballestrini
closed
1 week ago
7
First README example fails
#9
cpa
closed
1 month ago
3
Feature/add l3 supercat models
#8
dleemiller
closed
2 months ago
0
changing to 64 bit ints and 32 bit floats
#7
dleemiller
closed
2 months ago
0
Feature/cython extensions
#6
dleemiller
closed
3 months ago
1
Feature/clustering
#5
dleemiller
closed
3 months ago
0
Feature/add topk search
#4
dleemiller
closed
3 months ago
0
add fuzzy deduplication algorithm
#3
dleemiller
closed
3 months ago
0
Feature/add hf downloader
#2
dleemiller
closed
3 months ago
0
Add llama2 supercat model
#1
dleemiller
closed
3 months ago
0