dleemiller WordLlama issues

dleemiller / WordLlama

Things you can do with the token embeddings of an LLM

MIT License

1.39k stars 47 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

query autocomplete through beam decoding

#43 VoVAllen opened 2 weeks ago
0
Integrate cache changes

#42 dleemiller closed 2 weeks ago
1
Use custom cache dir for tokenizer download, too

#41 erickpeirson closed 2 weeks ago
2
Fix error with `tokenizer.model_dump()` on vanilla install.

#40 rapatel0 opened 1 month ago
1
Numpy bitwise count

#39 dleemiller closed 1 month ago
1
Something is wrong in versions accessible by PIP

#38 tumikosha closed 4 weeks ago
2
How to extract the Token Embedding

#37 biniyoni closed 2 weeks ago
4
inference.py: add batch_size argument to rank()

#36 jgbarah closed 1 month ago
0
Return idx deduplicate

#35 dleemiller closed 1 month ago
0
Vector similarity dedupe refactor

#34 dleemiller closed 1 month ago
0
Allow WordLlama.rank to not sort the results

#33 jgbarah closed 1 month ago
1
Matryoshka Representations Evaluation

#32 KyleSmith19091 closed 1 month ago
2
shape check before equality comparison

#31 dleemiller closed 1 month ago
0
Doubts about utility to multilingual models

#30 TheMrguiller opened 1 month ago
4
Word Splitting

#29 chapmanjacobd closed 1 month ago
2
Cleanup prerelease

#28 dleemiller closed 1 month ago
0
inference benchmarks

#27 dleemiller closed 1 month ago
0
Feature/semantic splitter

#26 dleemiller closed 1 month ago
0
A example of using WordLlama for a RAG pipeline

#25 dinhanhx closed 1 month ago
4
Linting

#24 dleemiller closed 2 months ago
0
fix some import errs

#23 jimexist closed 2 months ago
0
Kmeans optimization

#22 dleemiller closed 2 months ago
0
The example does not work

#21 tumikosha closed 2 months ago
1
How do you really create WordLlama model?

#20 dinhanhx opened 2 months ago
3
Feature / Add Semantic Splitting

#19 dleemiller closed 1 month ago
3
Fix/memory consumption embed

#18 dleemiller closed 2 months ago
0
wl.embed, wl.cluster high RAM usage

#17 chapmanjacobd closed 2 months ago
8
tokenizer = Tokenizer.from_file(str(tokenizer_path)) Exception: data did not match any variant of untagged enum PyNormalizerTypeWrapper at line 49 column 3

#16 gfkdliucheng closed 1 month ago
2
Feature/extraction tutorial

#15 dleemiller closed 2 months ago
0
Need detailed example on how to extract the embedding model from LLM

#14 harshitv804 closed 2 months ago
3
ModuleNotFoundError: No module named 'wordllama.algorithms.kmeans_helpers'

#13 chapmanjacobd closed 2 months ago
2
Gradio Demo

#12 amrrs closed 2 months ago
4
fixing float<>double changes in cython, adding function test

#11 dleemiller closed 2 months ago
0
Fedora Linux: Illegal instruction (core dumped)

#10 russellballestrini closed 1 month ago
7
First README example fails

#9 cpa closed 2 months ago
3
Feature/add l3 supercat models

#8 dleemiller closed 3 months ago
0
changing to 64 bit ints and 32 bit floats

#7 dleemiller closed 3 months ago
0
Feature/cython extensions

#6 dleemiller closed 4 months ago
1
Feature/clustering

#5 dleemiller closed 4 months ago
0
Feature/add topk search

#4 dleemiller closed 4 months ago
0
add fuzzy deduplication algorithm

#3 dleemiller closed 4 months ago
0
Feature/add hf downloader

#2 dleemiller closed 4 months ago
0
Add llama2 supercat model

#1 dleemiller closed 4 months ago
0