Open walking-octopus opened 1 year ago
Great idea - we should do that!
It seems there are other, less niche models for spelling correction, like t5-spellchecker or other BERT-based models. Since there's been some work on T5 and there BERT.cpp (which does not yet support decoding), unless this model outperforms it in quality, ease of implementation, or resource usage, efforts can be directed to these two.
I would like to give this a try.
While I was trying to figure out how to convert a small pytorch based model to ggml, I've found this thread.
I wanted to emphasize that small models (sub 1gig) exist, which provide great results for their specific tasks, without requiring multiple gigabytes of storage space and memory.
Thank you.
I would like to finish this implementation, do any of the people who have already attempted have any recommendations?
@Ferruolo please go head!
Should changes go to LLAMA.cpp or GGML?
Depends on the interface that will be exposed, but I suppose the ggml
repo would be more suitable
I checked t5-base-spellchecker and it works with #8141:
./llama-cli -m /mnt/md0/models/t5-base-spellchecker.gguf -p 'christmas is celbrated on decembr 25 evry ear'
...
llama_output_reserve: reallocating output buffer from size 0.13 MiB to 2.13 MiB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically
christmas is celebrated on december 25 every year [end of text]
llama_print_timings: load time = 44.46 ms
llama_print_timings: sample time = 1.22 ms / 11 runs ( 0.11 ms per token, 9001.64 tokens per second)
llama_print_timings: prompt eval time = 59.88 ms / 18 tokens ( 3.33 ms per token, 300.58 tokens per second)
llama_print_timings: eval time = 140.48 ms / 10 runs ( 14.05 ms per token, 71.18 tokens per second)
llama_print_timings: total time = 255.65 ms / 28 tokens
Log end
just posted it here https://github.com/ggerganov/llama.cpp/issues/8204 , but there is now an example of deployed ggml spellchecking AND on-device finetuning !
Apple had recently announced a new transformer-based keyboard auto-correct and prediction.
xfspell seems to be an existing model that tried doing it, so why not investigate if it can be ported to GGML. If anyone know other models for predictive keyboard or auto correct, please drop your suggestions here.
Perhaps this may even be a good test case for on-device QLoRA fine-tuning.
High quality predictive keyboards and auto-correct in pure C++ can be a useful thing for open-source mobile operating systems like Ubuntu Touch and privacy-focused Android ROMs, because traditionally, such proposals got rejected because of excessive dependencies for ML inference.