Mobile-Artificial-Intelligence / maid

Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.
MIT License
1.43k stars 159 forks source link

no models return any output? #442

Closed krecicki closed 7 months ago

krecicki commented 7 months ago

cool app!

I have it running on my galaxy note 10 8GB RAM

I uploaded a .gguf a llama7b, I tried calypso-alpha-v2.gguf and a tiny llama lora I made.

It never responds. The dots just keep going.

I've checked logcat. No errors. No bugs. I'm using the latest repo.

If I go back to the page where I loaded the .gguf model. It has a spinning loading wheel and I have to reset the app.

danemadsen commented 7 months ago

I just checked and it was something to do with llama.cpp thats broken maid_llm, i simply rolled back llama.cpp and its working now

republishing 1.2.3 in justa few minute so you can give it a try

krecicki commented 7 months ago

@danemadsen hey dane I went through the whole setup again, got the app running, no response, no errors. I'm using a tiny-llama model that works with termux. What was the last working version?

danemadsen commented 7 months ago

@danemadsen hey dane I went through the whole setup again, got the app running, no response, no errors. I'm using a tiny-llama model that works with termux. What was the last working version?

OK that's strange, maybe try going into app settings and clearing cache

krecicki commented 7 months ago

@danemadsen I tried a few different releases and just downloaded the APK and skipped building myself. None of them have any responses. I even tried a model you'd shown in the below screenshots.

I am using a Galaxy Note 10 on Android 13 8GB

danemadsen commented 7 months ago

@danemadsen I tried a few different releases and just downloaded the APK and skipped building myself. None of them have any responses. I even tried a model you'd shown in the below screenshots.

I am using a Galaxy Note 10 on Android 13 8GB

If you want to try one that I know should definitely work try 1.1.8

Its been cited at the prefered working version of a few people who have reached out to me

krecicki commented 7 months ago

@danemadsen you nailed it 1.1.8 works. good job on that.

However, it runs on and on and doesn't stop at tags.

Screenshot 2024-03-23 at 4 43 46 PM

I am coming from llama_cpp_python.

Something called OpenBuddy worked well over there.

@register_chat_format("openbuddy")
def format_openbuddy(
    messages: List[llama_types.ChatCompletionRequestMessage],
    **kwargs: Any,
) -> ChatFormatterResponse:
    _system_message = """You are a helpful, respectful and honest INTP-T AI Assistant named Buddy. You are talking to a human User.
Always answer as helpfully and logically as possible, while being safe. Your answers should not include any harmful, political, religious, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
You can speak fluently in many languages, for example: English, Chinese.
You cannot access the internet, but you have vast knowledge, cutoff: 2021-09.
You are trained by OpenBuddy team, (https://openbuddy.ai, https://github.com/OpenBuddy/OpenBuddy), you are based on LLaMA and Falcon transformers model, not related to GPT or OpenAI.

"""
    _roles = dict(user="User", assistant="Assistant")
    _sep = "\n"
    system_message = _system_message
    _messages = _map_roles(messages, _roles)
    _messages.append((_roles["assistant"], None))
    _prompt = _format_add_colon_single(system_message, _messages, _sep)
    return ChatFormatterResponse(prompt=_prompt)

Also the ./server from the llama.cpp project output (kept adding </SYS at the ends of the outputs still). Which variable can I strip or replace characters from in which file.

I'd also like to hardcode the model path in the file and add the .gguf to an asset folder. Is this possible?

I'm not uploading it to the play store so the APK size isn't a big deal.

danemadsen commented 7 months ago

@danemadsen you nailed it 1.1.8 works. good job on that.

However, it runs on and on and doesn't stop at tags.

Screenshot 2024-03-23 at 4 43 46 PM

I am coming from llama_cpp_python.

Something called OpenBuddy worked well over there.

@register_chat_format("openbuddy")
def format_openbuddy(
    messages: List[llama_types.ChatCompletionRequestMessage],
    **kwargs: Any,
) -> ChatFormatterResponse:
    _system_message = """You are a helpful, respectful and honest INTP-T AI Assistant named Buddy. You are talking to a human User.
Always answer as helpfully and logically as possible, while being safe. Your answers should not include any harmful, political, religious, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
You can speak fluently in many languages, for example: English, Chinese.
You cannot access the internet, but you have vast knowledge, cutoff: 2021-09.
You are trained by OpenBuddy team, (https://openbuddy.ai, https://github.com/OpenBuddy/OpenBuddy), you are based on LLaMA and Falcon transformers model, not related to GPT or OpenAI.

"""
    _roles = dict(user="User", assistant="Assistant")
    _sep = "\n"
    system_message = _system_message
    _messages = _map_roles(messages, _roles)
    _messages.append((_roles["assistant"], None))
    _prompt = _format_add_colon_single(system_message, _messages, _sep)
    return ChatFormatterResponse(prompt=_prompt)

Also the ./server from the llama.cpp project output (kept adding </SYS at the ends of the outputs still). Which variable can I strip or replace characters from in which file.

I'd also like to hardcode the model path in the file and add the .gguf to an asset folder. Is this possible?

I'm not uploading it to the play store so the APK size isn't a big deal.

yeah 1.1.8 is an old version now. Im doing alot of stuff different that allows me to stop generation correctly

ive bumped the android version for you so give an actions build a try

danemadsen commented 7 months ago

give this run a go

https://github.com/Mobile-Artificial-Intelligence/maid/actions/runs/8405308893

krecicki commented 7 months ago

give this run a go

https://github.com/Mobile-Artificial-Intelligence/maid/actions/runs/8405308893

@danemadsen The one you just posted doesn't respond. Damndest thing is that .apk from 1.1.8 responds.

If it helps any, this apk you just posted. I cleared the session because it wasn't responding, went to the llamacpp parameters page and it just hangs on loading and I have to close to app.

I am going to try to build my own version of 1.1.8 because the apk responds. It just needs an different chat format like the one used in llama.cpp server or openbuddy I posted.

danemadsen commented 7 months ago

give this run a go

https://github.com/Mobile-Artificial-Intelligence/maid/actions/runs/8405308893

@danemadsen The one you just posted doesn't respond. Damndest thing is that .apk from 1.1.8 responds.

If it helps any, this apk you just posted. I cleared the session because it wasn't responding, went to the llamacpp parameters page and it just hangs on loading and I have to close to app.

I am going to try to build my own version of 1.1.8 because the apk responds. It just needs an different chat format like the one used in llama.cpp server or openbuddy I posted.

Yeah go crazy, the actual llama.CPP code isn't that different to 1.1.8 so it shouldn't crash but for whatever reason it is for you. It is working on my devices its just very slow

krecicki commented 7 months ago

@danemadsen yeah I just built it from the source 1.1.8 and it is responding, just super slow lol. Like super slow.

What file is the chat template in? How is it using a chat template right now in 1.1.8 before I go digging. Can you explain?

Screenshot 2024-03-23 at 5 55 44 PM
danemadsen commented 7 months ago

@danemadsen yeah I just built it from the source 1.1.8 and it is responding, just super slow lol. Like super slow.

What file is the chat template in? How is it using a chat template right now in 1.1.8 before I go digging. Can you explain? Screenshot 2024-03-23 at 5 55 44 PM

chat template should be hardcoded in core.cpp from memory.

krecicki commented 7 months ago

@danemadsen cool and do you have any tips on where to add a .replace() type function on the chat output before it is seen. .replace(/</, '') type thing?

danemadsen commented 7 months ago

@danemadsen cool and do you have any tips on where to add a .replace() type function on the chat output before it is seen. .replace(/</, '') type thing?

Yeah its implemented here in the current version:

https://github.com/Mobile-Artificial-Intelligence/maid_llm/blob/4b4a34b88bb6de3f6ee62f7ebbfcba18fb6841f5/src/maid_llm.cpp#L558-L576

and in 1.1.8 its ment to be implemented here but doesnt work well:

https://github.com/Mobile-Artificial-Intelligence/maid/blob/12b504e3bee343c9b9c30b2c9dbeb4b8b0dc8966/src/core.cpp#L232-L298

krecicki commented 7 months ago

@danemadsen What part isn't working well that I can try to improve on? Anything specific?

danemadsen commented 7 months ago

@danemadsen What part isn't working well that I can try to improve on? Anything specific?

specifically the way it attempts to find and remove the antiprompt, also the preprompt didnt work in that version either which is likely why its so fast, as in the newer version the preprompt is created / processed which is the proper way of doing things.

I would advise against continuing work from 1.1.8, I know it may be working for you but alot of work has been done in more recent versions to get certain things working.

Recent versions of the app do work, people other than me have tested them its just that phones are very under powered so its slow and its probably always going to be slow on android for certain models.

krecicki commented 7 months ago

I wonder why 1.1.8 is working and the others are not for me. It's a good phone the Galaxy Note 10 8GB .. like I said it runs the llama.cpp ./server using termux super fast.

@danemadsen and one last question. I want to put my .gguf in the .apk when it is built and have the path hard coded. Any direction on this? Sorry for all the questions, you answer lol and I can't step away from this

danemadsen commented 7 months ago

I wonder why 1.1.8 is working and the others are not for me. It's a good phone the Galaxy Note 10 8GB .. like I said it runs the llama.cpp ./server using termux super fast.

@danemadsen and one last question. I want to put my .gguf in the .apk when it is built and have the path hard coded. Any direction on this? Sorry for all the questions, you answer lol and I can't step away from this

typically you wouldnt store gguf internally within the app because it will make the install size huge and the build time long but if you wanted to you would put it in the assets folder and link it in the Pubspec.yaml

krecicki commented 7 months ago

@danemadsen I feel like im begging for pennies but any clearer direction on the linking part I am going to donate to your sponsor thing here soon. You're great man.

danemadsen commented 7 months ago

@danemadsen I feel like im begging for pennies but any clearer direction on the linking part I am going to donate to your sponsor thing here soon. You're great man.

Its all good

In the pubspec file see here

https://github.com/Mobile-Artificial-Intelligence/maid/blob/main/pubspec.yaml#L87-L88

this tells dart / flutter to include every file you place in the assets folder in the build

then to use those files in flutter you do so how is done here

https://github.com/Mobile-Artificial-Intelligence/maid/blob/main/lib/providers/character.dart#L15

if you wanted to pass that file to llama.cpp you would then have to use file.absolute to get the correct path

krecicki commented 7 months ago

@danemadsen File("assets/defaultCharacter.png"); thats it? Where in your code should I place File("assets/mymodel.gguf"); what file do you manage the upload path?

On a side note. I have termux loading on boot with termux-boot and then running the llama.cpp ./server. So you can restart the phone or turn it off and you can just open a webview app im making that loads the 127.0.0.1:8080

It's not really a repo, it's more of a process to set it up. Let me know if your curious.

danemadsen commented 7 months ago

@danemadsen File("assets/defaultCharacter.png"); thats it? Where in your code should I place File("assets/mymodel.gguf"); what file do you manage the upload path?

On a side note. I have termux loading on boot with termux-boot and then running the llama.cpp ./server. So you can restart the phone or turn it off and you can just open a webview app im making that loads the 127.0.0.1:8080

It's not really a repo, it's more of a process to set it up. Let me know if your curious.

The model path is passed into llama.CPP in the local generation.dart file where core_init is called

danemadsen commented 7 months ago

Ok dam you were right there is an issue its not related to the C++ though its something else, working on it now

danemadsen commented 7 months ago

@krecicki ok redownload 1.2.3 again and give it a try, should be fixed now

BurrHM commented 7 months ago

Desktop release works but Android version still doesn't work.

RookieIndieDev commented 7 months ago

Will try on windows, android doesn't seem to be working still

danemadsen commented 7 months ago

Will try on windows, android doesn't seem to be working still

its working on my phone now its just super slow. Maybe try uninstalling old version first if you haven't tried that already

RookieIndieDev commented 7 months ago

Tried that, cleared the storage and cache, still doesn't work. What is making it slow? Compared to when running the same model using llama.cpp using termux, it seems to generate outputs reasonably fast.

danemadsen commented 7 months ago

@RookieIndieDev Processing the preprompt (All the info on the character page) is what slows it down. Prior to 1.1.8 there was barely any processing that needed to be done before inference that's why 1.1.8 was so fast compared to now.

It takes about 2 minutes to get an output on my phone. Ive also tested it on my tablet and its about the same. Not sure, I know lingering data can break things from time to time though.

danemadsen commented 7 months ago

@RookieIndieDev Ok try https://github.com/Mobile-Artificial-Intelligence/maid/actions/runs/8450330304 I added a whole bunch of log calls to time everything so you can see whats taking so long

RookieIndieDev commented 7 months ago

Windows immediately crashes on when clicking the arrow button, where to find the windows logs? Android logs should be show in app settings?

RookieIndieDev commented 7 months ago

Should've been going for 30 mins at least, still no output on android.

danemadsen commented 7 months ago

@RookieIndieDev OK this is what the logs should look like on androidScreenshot_2024-03-28-08-51-02-57_f491d42f48f7e59ed03e7bce3f3fe7a5.jpg

danemadsen commented 7 months ago

@RookieIndieDev and like this on windows Screenshot 2024-03-28 085317 windows logs should be generated in the same directory as the exe in llama.log

RookieIndieDev commented 7 months ago

Nope, nothing like that was generated.

danemadsen commented 7 months ago

Nope, nothing like that was generated.

It has to be some kind of bug caused by lingering files from older versions or something then. Whats the last version that's working for you then?

It definitely should be working though because Ive tested it on 3 different android devices, 2 different Windows computer, 1 Linux computer.

Whats the details of your systems? Windows 10 i thought you said and what version of Android?

All i can keep recommending is clear the cache in app settings and make sure older versions are uninstalled

RookieIndieDev commented 7 months ago

On windows 10, yes. On windows, I deleted the old files that were download and extracted the new archive. I am on android 11. I will try, uninstalling and reinstalling, see if that works. Does windows take other setups to clean up the older versions?

danemadsen commented 7 months ago

For windows it should be as simple as deleting the version and launching the new one. You may have to clear cache in settings too.

RookieIndieDev commented 7 months ago

Yeah, I did that. Crashes immediately after pressing the arrow, I posted the error code from the event viewer on the issue that I made previously.

RookieIndieDev commented 7 months ago

Screenshot_20240403-111920-352.png

Grabbed the APK from the latest actions build, logging seems to be working but no output so far, must have been 10 mins at least.

danemadsen commented 7 months ago

Grabbed the APK from the latest actions build, logging seems to be working but no output so far, must have been 10 mins at least.

Your phone only has 3GB of ram? I'm not surprised it doesn't work. Minimum for running models is about 6GB. You're gonna have to stick to the terminal method if you want to run LLM's on a phone with the little of ram.

RookieIndieDev commented 7 months ago

Same model worked with Maid as of 1.1.8 though? I did pick a quantized version that works within 3 GB of RAM.

danemadsen commented 7 months ago

Oh yeah and your not getting a Model Init timing so its likely somethings failing there:

For reference this is what my timings look like on windows

Screenshot 2024-04-03 161627

RookieIndieDev commented 7 months ago

Something wrong with this specific model?

RookieIndieDev commented 6 months ago

Screenshot_20240406-093425-111.png

Getting this error. Tried the latest build with two different models, getting the error both times.