Optimize model support - Githubissues

for now, leverage chat gpt ootb token limit and use recursive, by-unit summarization, then commit the summarized outputs, and have docs that instruct the user about how to use these outputs (include a bit about reflection)

Vandivier commented 1 year ago

https://www.zdnet.com/article/this-new-technology-could-blow-away-gpt-4-and-everything-like-it/

Vandivier commented 1 year ago

we can use webGPU in M2 if reg GPU doesn't get working

Vandivier commented 1 year ago

https://www.reddit.com/r/LocalLLaMA/comments/12vzjti/new_fully_open_source_model_h2ogpt_20b_based_on/

Vandivier commented 1 year ago

fine tune dolly v2 for $30 https://www.tiktok.com/@rajistics/video/7222430618347490602

Vandivier commented 1 year ago

few shot or in-context learning is considered newer than fine-tuning (but is it more performant?)

https://www.tiktok.com/@rajistics/video/7226905183601708331

Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning https://openreview.net/forum?id=rBCvMG-JsPd

but what about a model i can't access like gpt-4? PEFT vicuna vs ICL GPT-4?

Vandivier commented 1 year ago

MPT-7B-StoryWriter-65k+ literally made to write books ("ALiBi, MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens, and we have demonstrated generations as long as 84k tokens on a single node of A100-80GB GPUs.")

that's 60000+ words

that's 120+ pages

that's a book

https://github.com/mosaicml/llm-foundry

Vandivier commented 1 year ago

ChatGLM6B

Vandivier commented 1 year ago

https://github.com/oobabooga/text-generation-webui/blob/main/models/config.yaml

supported oobabooga models

Vandivier commented 1 year ago

https://www.paperspace.com/pricing

Vandivier commented 1 year ago

Geforce RTX 2060

or better to run https://github.com/openai/triton (optimized thingy under MPT)

Vandivier commented 1 year ago

https://github.com/cocktailpeanut/dalai

little lighter weight than oobabooga (maybe?)

Vandivier commented 1 year ago

web gpu acceleration https://github.com/mlc-ai/web-llm

4.8276 tokens/sec for my Nvidia geforce GTX 960, ~5.5 CUDA Compute ability, 4 gb GPU dedicated RAM

Vandivier commented 1 year ago

TODO: cloud dev with langchain try paperspace + A100 can i use IPUs? (paperspace)

Vandivier commented 1 year ago

m2 apple air 16 gb 13.2 got 15 tokens/second

Vandivier commented 1 year ago

peft library: https://pypi.org/project/peft/

can be done using the a100

Vandivier commented 1 year ago

More big context windows

https://www.tiktok.com/t/ZTRKyRvhh/

upstartjohnvandivier commented 1 year ago

100k context window claude and rates lgtm anthropic.com/product

Vandivier commented 1 year ago

https://www.tiktok.com/t/ZTRKbBTwf/

Better than chain of thought prompting

Vandivier commented 1 year ago

new model 99% as good at gpt-3.5 and low mem requirements, fine tunable and open (commercial...?) https://www.youtube.com/watch?v=3PVg86bnKDg

Vandivier commented 1 year ago

https://dev.to/dhanushreddy29/deploy-hugging-face-models-on-serverless-gpu-47am

Vandivier commented 1 year ago

Falcon 40B is a the new open winner and commercially licensed.

Need to pin down exact perf va GPT-3.5 and context tradeoff and cost tradeoff

But v cool it's open and I can finetune on Ladderly info for a fully closed Ladderly-Chat (rect can easily support an OR operation on model by env var so rly don't need to pick A or B, we can also add story writer or another large context option)

https://huggingface.co/blog/falcon#fine-tuning-with-peft

Vandivier commented 1 year ago

Claude 2 and Bing on the table Can we access Bing OCR programmatically?

Legacy OCR (non-LLM) could be fine too

https://youtu.be/anljthOQHhg

Vandivier commented 1 year ago

fine tuning engine https://github.com/scaleapi/llm-engine also we want llama 2 rn as the best open approach; gpt-4 is still better for closed source i think we need to charge more for that tho

Vandivier commented 1 year ago

https://youtu.be/z2QE12p3kMM

Llama 2 custom model fine tuning

Vandivier commented 1 year ago

tuned llama 2 vs gpt-4 perf

https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehensive-case-study-for-tailoring-models-to-unique-applications

Vandivier commented 1 year ago

Nvidia trains models, apple silicon runs models

another awesome place to train models is: https://vast.ai/pricing

with a 4090, i could lease my gpu for $.40 an hour atm (no idea about rate of decay) this is not crazy money though. that works out to $200-300/month not counting the energy bill

Vandivier commented 1 year ago

wait i guess vast pricing is per minute not per hour based on this blog https://vast.ai/article/running-the-70B-LLama2-GPTQ

i'm super confused...if that's the case it's an obvious buy though https://twitter.com/JohnVandivier/status/1693110093858931142

Vandivier commented 1 year ago

did we mention llava? (multimodal) https://stackshare.io/llava

Vandivier commented 1 year ago

IDEFICS largest open multimodal. Understands images

https://youtu.be/Uif25fPbeuQ?si=hW2h4GWZDfJqVNGo

Vandivier / rect

Optimize model support #2