Closed krecicki closed 7 months ago
I just checked and it was something to do with llama.cpp thats broken maid_llm, i simply rolled back llama.cpp and its working now
republishing 1.2.3 in justa few minute so you can give it a try
@danemadsen hey dane I went through the whole setup again, got the app running, no response, no errors. I'm using a tiny-llama model that works with termux. What was the last working version?
@danemadsen hey dane I went through the whole setup again, got the app running, no response, no errors. I'm using a tiny-llama model that works with termux. What was the last working version?
OK that's strange, maybe try going into app settings and clearing cache
@danemadsen I tried a few different releases and just downloaded the APK and skipped building myself. None of them have any responses. I even tried a model you'd shown in the below screenshots.
I am using a Galaxy Note 10 on Android 13 8GB
@danemadsen I tried a few different releases and just downloaded the APK and skipped building myself. None of them have any responses. I even tried a model you'd shown in the below screenshots.
I am using a Galaxy Note 10 on Android 13 8GB
If you want to try one that I know should definitely work try 1.1.8
Its been cited at the prefered working version of a few people who have reached out to me
@danemadsen you nailed it 1.1.8 works. good job on that.
However, it runs on and on and doesn't stop at tags.
I am coming from llama_cpp_python.
Something called OpenBuddy worked well over there.
@register_chat_format("openbuddy")
def format_openbuddy(
messages: List[llama_types.ChatCompletionRequestMessage],
**kwargs: Any,
) -> ChatFormatterResponse:
_system_message = """You are a helpful, respectful and honest INTP-T AI Assistant named Buddy. You are talking to a human User.
Always answer as helpfully and logically as possible, while being safe. Your answers should not include any harmful, political, religious, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
You can speak fluently in many languages, for example: English, Chinese.
You cannot access the internet, but you have vast knowledge, cutoff: 2021-09.
You are trained by OpenBuddy team, (https://openbuddy.ai, https://github.com/OpenBuddy/OpenBuddy), you are based on LLaMA and Falcon transformers model, not related to GPT or OpenAI.
"""
_roles = dict(user="User", assistant="Assistant")
_sep = "\n"
system_message = _system_message
_messages = _map_roles(messages, _roles)
_messages.append((_roles["assistant"], None))
_prompt = _format_add_colon_single(system_message, _messages, _sep)
return ChatFormatterResponse(prompt=_prompt)
Also the ./server from the llama.cpp project output (kept adding </SYS at the ends of the outputs still). Which variable can I strip or replace characters from in which file.
I'd also like to hardcode the model path in the file and add the .gguf to an asset folder. Is this possible?
I'm not uploading it to the play store so the APK size isn't a big deal.
@danemadsen you nailed it 1.1.8 works. good job on that.
However, it runs on and on and doesn't stop at tags.
I am coming from llama_cpp_python.
Something called OpenBuddy worked well over there.
@register_chat_format("openbuddy") def format_openbuddy( messages: List[llama_types.ChatCompletionRequestMessage], **kwargs: Any, ) -> ChatFormatterResponse: _system_message = """You are a helpful, respectful and honest INTP-T AI Assistant named Buddy. You are talking to a human User. Always answer as helpfully and logically as possible, while being safe. Your answers should not include any harmful, political, religious, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. You can speak fluently in many languages, for example: English, Chinese. You cannot access the internet, but you have vast knowledge, cutoff: 2021-09. You are trained by OpenBuddy team, (https://openbuddy.ai, https://github.com/OpenBuddy/OpenBuddy), you are based on LLaMA and Falcon transformers model, not related to GPT or OpenAI. """ _roles = dict(user="User", assistant="Assistant") _sep = "\n" system_message = _system_message _messages = _map_roles(messages, _roles) _messages.append((_roles["assistant"], None)) _prompt = _format_add_colon_single(system_message, _messages, _sep) return ChatFormatterResponse(prompt=_prompt)
Also the ./server from the llama.cpp project output (kept adding </SYS at the ends of the outputs still). Which variable can I strip or replace characters from in which file.
I'd also like to hardcode the model path in the file and add the .gguf to an asset folder. Is this possible?
I'm not uploading it to the play store so the APK size isn't a big deal.
yeah 1.1.8 is an old version now. Im doing alot of stuff different that allows me to stop generation correctly
ive bumped the android version for you so give an actions build a try
give this run a go
https://github.com/Mobile-Artificial-Intelligence/maid/actions/runs/8405308893
@danemadsen The one you just posted doesn't respond. Damndest thing is that .apk from 1.1.8 responds.
If it helps any, this apk you just posted. I cleared the session because it wasn't responding, went to the llamacpp parameters page and it just hangs on loading and I have to close to app.
I am going to try to build my own version of 1.1.8 because the apk responds. It just needs an different chat format like the one used in llama.cpp server or openbuddy I posted.
give this run a go
https://github.com/Mobile-Artificial-Intelligence/maid/actions/runs/8405308893
@danemadsen The one you just posted doesn't respond. Damndest thing is that .apk from 1.1.8 responds.
If it helps any, this apk you just posted. I cleared the session because it wasn't responding, went to the llamacpp parameters page and it just hangs on loading and I have to close to app.
I am going to try to build my own version of 1.1.8 because the apk responds. It just needs an different chat format like the one used in llama.cpp server or openbuddy I posted.
Yeah go crazy, the actual llama.CPP code isn't that different to 1.1.8 so it shouldn't crash but for whatever reason it is for you. It is working on my devices its just very slow
@danemadsen yeah I just built it from the source 1.1.8 and it is responding, just super slow lol. Like super slow.
What file is the chat template in? How is it using a chat template right now in 1.1.8 before I go digging. Can you explain?
@danemadsen yeah I just built it from the source 1.1.8 and it is responding, just super slow lol. Like super slow.
What file is the chat template in? How is it using a chat template right now in 1.1.8 before I go digging. Can you explain?
chat template should be hardcoded in core.cpp from memory.
@danemadsen cool and do you have any tips on where to add a .replace() type function on the chat output before it is seen. .replace(/</, '') type thing?
@danemadsen cool and do you have any tips on where to add a .replace() type function on the chat output before it is seen. .replace(/</, '') type thing?
Yeah its implemented here in the current version:
and in 1.1.8 its ment to be implemented here but doesnt work well:
@danemadsen What part isn't working well that I can try to improve on? Anything specific?
@danemadsen What part isn't working well that I can try to improve on? Anything specific?
specifically the way it attempts to find and remove the antiprompt, also the preprompt didnt work in that version either which is likely why its so fast, as in the newer version the preprompt is created / processed which is the proper way of doing things.
I would advise against continuing work from 1.1.8, I know it may be working for you but alot of work has been done in more recent versions to get certain things working.
Recent versions of the app do work, people other than me have tested them its just that phones are very under powered so its slow and its probably always going to be slow on android for certain models.
I wonder why 1.1.8 is working and the others are not for me. It's a good phone the Galaxy Note 10 8GB .. like I said it runs the llama.cpp ./server using termux super fast.
@danemadsen and one last question. I want to put my .gguf in the .apk when it is built and have the path hard coded. Any direction on this? Sorry for all the questions, you answer lol and I can't step away from this
I wonder why 1.1.8 is working and the others are not for me. It's a good phone the Galaxy Note 10 8GB .. like I said it runs the llama.cpp ./server using termux super fast.
@danemadsen and one last question. I want to put my .gguf in the .apk when it is built and have the path hard coded. Any direction on this? Sorry for all the questions, you answer lol and I can't step away from this
typically you wouldnt store gguf internally within the app because it will make the install size huge and the build time long but if you wanted to you would put it in the assets folder and link it in the Pubspec.yaml
@danemadsen I feel like im begging for pennies but any clearer direction on the linking part I am going to donate to your sponsor thing here soon. You're great man.
@danemadsen I feel like im begging for pennies but any clearer direction on the linking part I am going to donate to your sponsor thing here soon. You're great man.
Its all good
In the pubspec file see here
https://github.com/Mobile-Artificial-Intelligence/maid/blob/main/pubspec.yaml#L87-L88
this tells dart / flutter to include every file you place in the assets folder in the build
then to use those files in flutter you do so how is done here
https://github.com/Mobile-Artificial-Intelligence/maid/blob/main/lib/providers/character.dart#L15
if you wanted to pass that file to llama.cpp you would then have to use file.absolute
to get the correct path
@danemadsen File("assets/defaultCharacter.png"); thats it? Where in your code should I place File("assets/mymodel.gguf"); what file do you manage the upload path?
On a side note. I have termux loading on boot with termux-boot and then running the llama.cpp ./server. So you can restart the phone or turn it off and you can just open a webview app im making that loads the 127.0.0.1:8080
It's not really a repo, it's more of a process to set it up. Let me know if your curious.
@danemadsen File("assets/defaultCharacter.png"); thats it? Where in your code should I place File("assets/mymodel.gguf"); what file do you manage the upload path?
On a side note. I have termux loading on boot with termux-boot and then running the llama.cpp ./server. So you can restart the phone or turn it off and you can just open a webview app im making that loads the 127.0.0.1:8080
It's not really a repo, it's more of a process to set it up. Let me know if your curious.
The model path is passed into llama.CPP in the local generation.dart file where core_init is called
Ok dam you were right there is an issue its not related to the C++ though its something else, working on it now
@krecicki ok redownload 1.2.3 again and give it a try, should be fixed now
Desktop release works but Android version still doesn't work.
Will try on windows, android doesn't seem to be working still
Will try on windows, android doesn't seem to be working still
its working on my phone now its just super slow. Maybe try uninstalling old version first if you haven't tried that already
Tried that, cleared the storage and cache, still doesn't work. What is making it slow? Compared to when running the same model using llama.cpp using termux, it seems to generate outputs reasonably fast.
@RookieIndieDev Processing the preprompt (All the info on the character page) is what slows it down. Prior to 1.1.8 there was barely any processing that needed to be done before inference that's why 1.1.8 was so fast compared to now.
It takes about 2 minutes to get an output on my phone. Ive also tested it on my tablet and its about the same. Not sure, I know lingering data can break things from time to time though.
@RookieIndieDev Ok try https://github.com/Mobile-Artificial-Intelligence/maid/actions/runs/8450330304 I added a whole bunch of log calls to time everything so you can see whats taking so long
Windows immediately crashes on when clicking the arrow button, where to find the windows logs? Android logs should be show in app settings?
Should've been going for 30 mins at least, still no output on android.
@RookieIndieDev OK this is what the logs should look like on android
@RookieIndieDev and like this on windows
windows logs should be generated in the same directory as the exe in llama.log
Nope, nothing like that was generated.
Nope, nothing like that was generated.
It has to be some kind of bug caused by lingering files from older versions or something then. Whats the last version that's working for you then?
It definitely should be working though because Ive tested it on 3 different android devices, 2 different Windows computer, 1 Linux computer.
Whats the details of your systems? Windows 10 i thought you said and what version of Android?
All i can keep recommending is clear the cache in app settings and make sure older versions are uninstalled
On windows 10, yes. On windows, I deleted the old files that were download and extracted the new archive. I am on android 11. I will try, uninstalling and reinstalling, see if that works. Does windows take other setups to clean up the older versions?
For windows it should be as simple as deleting the version and launching the new one. You may have to clear cache in settings too.
Yeah, I did that. Crashes immediately after pressing the arrow, I posted the error code from the event viewer on the issue that I made previously.
Grabbed the APK from the latest actions build, logging seems to be working but no output so far, must have been 10 mins at least.
Grabbed the APK from the latest actions build, logging seems to be working but no output so far, must have been 10 mins at least.
Your phone only has 3GB of ram? I'm not surprised it doesn't work. Minimum for running models is about 6GB. You're gonna have to stick to the terminal method if you want to run LLM's on a phone with the little of ram.
Same model worked with Maid as of 1.1.8 though? I did pick a quantized version that works within 3 GB of RAM.
Oh yeah and your not getting a Model Init timing so its likely somethings failing there:
For reference this is what my timings look like on windows
Something wrong with this specific model?
Getting this error. Tried the latest build with two different models, getting the error both times.
cool app!
I have it running on my galaxy note 10 8GB RAM
I uploaded a .gguf a llama7b, I tried calypso-alpha-v2.gguf and a tiny llama lora I made.
It never responds. The dots just keep going.
I've checked logcat. No errors. No bugs. I'm using the latest repo.
If I go back to the page where I loaded the .gguf model. It has a spinning loading wheel and I have to reset the app.