Open Vandivier opened 1 year ago
This one seems easy to set up for the hackathon: https://youtu.be/ByV5w1ES38A
Want to add retrieval augmentation if possible, time box to one day
And this https://youtu.be/nVC9D9fRyNU
more retrieval augmentation: https://huggingface.co/spaces/deepset/retrieval-augmentation-svb/blob/main/app.py
ANOTHA ONE (koala)
Low memory requirements - 6 gb iirc https://youtu.be/fGpXj4bl5LI
GPT4all v2
for now, leverage chat gpt ootb token limit and use recursive, by-unit summarization, then commit the summarized outputs, and have docs that instruct the user about how to use these outputs (include a bit about reflection)
related https://twitter.com/simonw/status/1647620943840428032
we can use webGPU in M2 if reg GPU doesn't get working
fine tune dolly v2 for $30 https://www.tiktok.com/@rajistics/video/7222430618347490602
few shot or in-context learning is considered newer than fine-tuning (but is it more performant?)
https://www.tiktok.com/@rajistics/video/7226905183601708331
Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
https://openreview.net/forum?id=rBCvMG-JsPd
but what about a model i can't access like gpt-4? PEFT vicuna vs ICL GPT-4?
MPT-7B-StoryWriter-65k+ literally made to write books ("ALiBi, MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens, and we have demonstrated generations as long as 84k tokens on a single node of A100-80GB GPUs.")
that's 60000+ words
that's 120+ pages
that's a book
ChatGLM6B
https://github.com/oobabooga/text-generation-webui/blob/main/models/config.yaml
supported oobabooga models
or better to run https://github.com/openai/triton (optimized thingy under MPT)
https://github.com/cocktailpeanut/dalai
little lighter weight than oobabooga (maybe?)
web gpu acceleration https://github.com/mlc-ai/web-llm
4.8276 tokens/sec for my Nvidia geforce GTX 960, ~5.5 CUDA Compute ability, 4 gb GPU dedicated RAM
TODO: cloud dev with langchain try paperspace + A100 can i use IPUs? (paperspace)
m2 apple air 16 gb 13.2 got 15 tokens/second
peft library: https://pypi.org/project/peft/
can be done using the a100
related https://twitter.com/Sumanth_077/status/1625774615753629696
More big context windows
100k context window claude and rates lgtm anthropic.com/product
https://www.tiktok.com/t/ZTRKbBTwf/
Better than chain of thought prompting
new model 99% as good at gpt-3.5 and low mem requirements, fine tunable and open (commercial...?) https://www.youtube.com/watch?v=3PVg86bnKDg
Falcon 40B is a the new open winner and commercially licensed.
Need to pin down exact perf va GPT-3.5 and context tradeoff and cost tradeoff
But v cool it's open and I can finetune on Ladderly info for a fully closed Ladderly-Chat (rect can easily support an OR operation on model by env var so rly don't need to pick A or B, we can also add story writer or another large context option)
Claude 2 and Bing on the table Can we access Bing OCR programmatically?
Legacy OCR (non-LLM) could be fine too
fine tuning engine https://github.com/scaleapi/llm-engine also we want llama 2 rn as the best open approach; gpt-4 is still better for closed source i think we need to charge more for that tho
Llama 2 custom model fine tuning
Nvidia trains models, apple silicon runs models
another awesome place to train models is: https://vast.ai/pricing
with a 4090, i could lease my gpu for $.40 an hour atm (no idea about rate of decay) this is not crazy money though. that works out to $200-300/month not counting the energy bill
wait i guess vast pricing is per minute not per hour based on this blog https://vast.ai/article/running-the-70B-LLama2-GPTQ
i'm super confused...if that's the case it's an obvious buy though https://twitter.com/JohnVandivier/status/1693110093858931142
did we mention llava? (multimodal) https://stackshare.io/llava
IDEFICS largest open multimodal. Understands images
During the hackathon I just picked a model without knowing too much about an ideal model fit. Think better and change if needed