Open kfsone opened 11 months ago
Hey @kfsone thanks for opening this issue - I think we are in agreement that there needs to be a better "recommended local LLM setup" guide that has explicit models etc. We are planning on moving to eventually having a default "bundle" that MemGPT ships with that includes a default model and default inference engine, which should alleviate many of the problems you're mentioning with too many setup options.
IMO, the closest equivalent of this is (in bullet points here obviously, but agreed that this should go in the readthedocs or something):
Setting up the backend:
Screenshots for this are available here: https://memgpt.readthedocs.io/en/latest/lmstudio/. We mention on this readthedocs page that you should use a minimum of Q5 quantization (otherwise MemGPT will start to fail), but in my experience as long as you use Q6 or Q8 you can have long conversations with MemGPT no problem (lots of people on the discord are using similar models with success - I think most people are using webui and lm studio as backends).
Setting up MemGPT:
pip install pymemgpt
memgpt configure
lmstudio
for backend type, use the defaults for everything elsememgpt run
, create new agentFollowing the above steps, you should get pretty reasonable performance. If you happen to be on Discord, send me a DM (you can find me on the memgpt discord server) and I'd be happy to hop on a call to go through the steps with you on your own machine (it shouldn't take longer than 10m or so). If you don't use discord we could also hop on a google meet / zoom, you can just send me an email or something.
I'm not sure why you're having issues with Ollama - I just retested from a fresh install and it works fine both on my macbook and WSL on my w10 desktop, no proxy required.
I think we are in agreement that there needs to be a better "recommended local LLM setup" guide that has explicit models etc.
I know in the past I've found like it felt a huge contradiction to say {select option}, {select option}, {I MAKE DECISION FOR U}, {select option}
, and I've had to find ways to gate that. I think having a touch-paper mechanism for first-contact with a "contrib" or "alternatives" folder of examples that ditch restrictions would be a great way to bring people into the experience.
I didn't intend to pose the examples as needing assistance: last week the approaches I was trying were foiled by text-generation-webui having dependency conflicts with autogen and then memgpt; my attempt to build a docker image with ollama + litelm + autogen/memgpt/jupyter a day later found autogen had updated their dependencies but I had to silo memgpt to avoid another, and when I fixed memgpt autogen introduced a pip-level conflict.
Totally normal for bleeding edge stuff, and this ticket isn't about "zomg, why are you off developing new features", the total opposite. I just want to make the argument for a little investment in a fully opinionated out-of-the-box ai or lm scenario, either as a docker compose stack or a single container as a means of increasing adoption while reducing adoption-support drag.
To distill it down to a single example: the original red/green/orange boxes for local lm support was nice in theory but it probably had an opposite effect to the one you wanted from community support. Just red (not working) and green (fully supported/working) is more likely to evoke a "glass is half full" reaction from potential contributors. I'll spare you my philosophizing about why :)
FWIW, I had gotten ollama + litelm working a while before posting this (earlier ollama didn't have all the chat api endpoints), with my own Docker image, with memgpt talking to it and starting to answer questions, but the model I'd ended up using got really chatty and opinative (screw the dictionary, I'm bringing that word back) and rapidly exhausted its 4k token limit. That misconfig is on me and just where I stopped.
[Aside, I'll report properly elsewhere, I just span up a clean install on my m2 mac mini: lm studio + conda + clean environment + pip install pymemgpt, and it appears transformers and torch are not in the current pip package's requirements, but then I also got the wrong dolphin, the one on the lm studio manifest is dolphin llama not mistral, doh]
Forgive the bluntness, but MemGPT is one of the hardest pieces of AI software to get to grips with at the moment because of an unfortunate conjunction of two things it tries to do well.
You've tried to provide guidance in the form of opinionated options in config, auto-saved settings, etc. But you then tried to be totally unopinionated about models and formats and etc.
Unfortunately, unless you're using MemGPT with OpenAI/GPT, this makes getting MemGPT to work-as-advertised, it is pure torture.
1- Gate the incomplete backend supports behind an "--expert" or "--contributor" flag, so that non-programmers trying to assemble things don't tie themselves and yourselves (via issues) in knots. 2- Complete the opinionation: When you tested MemGPT with ollama/webui/kobold/etc, you probably used some internal knowledge to pick which models/configs you used, and then withheld that information. 3- Include some actual, concrete, baseline "this will get you going" examples rather than "Steps: EXACT A, EXACT B, insert whatever here, EXACT C", because they fail. 4- Stick pins in it: The best chance of getting community support for your project is to ensure they can see the thing work, with how dynamic this field is right now that's going to be hard. E.g, in the last 5 days, text-generation-webui made a change so that enabling the api automatically forces the openai extension, breaking memgpt compatibility. Probably the best way to do this would be docker images.
Examples: I couldn't get the ollama backend to work at all, I'm not sure how you did, because ollama doesn't actually most of the api calls you try to make. The only way to get it working is to put the LiteLLM proxy infront of it, which is fine and can be done within a single Docker container, but you don't mention that.
Once you have LiteLLM and Ollama, it's still hard to get MemGPT working with it for multiple reasons. First, configure suggests a very narrow set of wrappers with no obvious way for a rookie to tell how to choose. It looks like the most logical course would be to choose, say, dolphin-2.1 and use the same model. However, I wrapped this with automation and tried every single thebloke dolphin-2.1 model file, none of them actually work consistently. You can get a single response back from a couple of them but after that you get nonsense. The actual models you need for this wrapper are specific dolphin-2.2.1 models (e.g TheBloke\dolphin-2.2.1-mistral-7B-GGUF\dolphin-2.2.1-mistral-7b.Q5_K_M.gguf).
Buuuut ... not so fast. Firstly, MemGPT is convinced this has an 8k window. It doesn't, it has a 4k window. Set that, and you can start talking to MemGPT. But it's sort of ugly, insisting on 512 token responses every time, and it doesn't truncate or wrap at the 4k window, so once you send the 4097th token your session is broken. (4297 > 4096, it tells you).
Notes: In this first non-GPT session I was able to get going, I suggested to it that in addition to answering my questions etc, it observe my language use in English & German to detect patterns that might explain how the unusual way in which I learned influenced the mistakes I make and difficulties I have, but this seemed to fail to trigger whatever mechanism should actually generate fact saves but with the LM generating text that claimed to be saving information for later.
I tried using koboldcpp but gave up after trying 6 different models, each of which I was able to communicate with using autogen or curl, but memgpt got caught out by recent 3rd-party module changes with respect to each model.
Key Error: 'mistral'
is where I gave up.This all boils down to my not having had an initial successful frame of reference to work out from with the local lm support.