Vali-98 / ChatterUI

Simple frontend for LLMs built in react-native.
GNU Affero General Public License v3.0
418 stars 24 forks source link

Road to 0.8.0 #89

Open Vali-98 opened 1 week ago

Vali-98 commented 1 week ago

NOTE: THIS IS NOT FEATURE REQUEST POST, THERE'S MORE THAN ENOUGH WORK HERE TO BE DONE

Hey all!

I'm planning a big 0.8.0 update for the app, to commemorate a year of development. The goal is to start yanking out a lot of temporary, undercooked or poorly maintained features (just look at startupApp, yikes) with newer solutions as I have learned and evolved as a Javascript/React-Native developer over the last year. Here is the gist of the update:

Below is a rough idea of planned changes and additions for 0.8.0.

Feel free to reply to ask questions or raise concerns!

Road to 0.8

CHAT

CHARACTERS

USER

LOCAL MODE

REMOTE MODE

APP MODES

SAMPLERS

MINOR UI TWEAKS

MISC

STYLE UNIFICATION

REJECTED PILE

Here are changes which were tested but were either too difficult or non-viable.

Chat

Local Mode:

Download in-app is removed due to download handling being too complex. Could be revisisted in the future

inspir3dArt commented 1 week ago

Hi,

I highly appreciate your hard work on this app, it's a really promising project.

From a user point of view there are three things that prevent me currently from using it frequently.

  1. The character card limitation. Most character cards use multiple fields like character, scenario, system prompt, jailbreak, example dialogues, first message. I made modified versions of some of my favorites that puts everything in the character and first massage fields, but most card's you download don't work out of the box correctly without modifying them. Koboldcpp has this implemented quite simple, they just put the text from all the fields, except the first massage, together in a certain order, and keep it in context, that works quite good.

  2. The rerolls of the last massage doesn't work. It's unfortunately a real deal breaker for roleplay, especially in the first replies I usually have to reroll a few times to get the roleplay getting in the direction I want.

  3. It would be nice if it would be possible to link to a model you have on your devices storage, instead of copying it to the internal app storage. I have quite a few models on my device and storage is limited. I know it would be not allowed for a version to release on playstore, but otherwise it's technically possible.

Please don't see this as some sort of demands, I know it is a hobby project and I highly appreciate you share your work. I'm quite impressed by how it performs, it's faster than using koboldcpp locally in termux, and will be much more convenient to use. And I fully understand that it can be more interesting to rewrite and improve more interesting part's with what you are learning on the way, and I hope you have a lot of fun doing it.

I guess from a user point of view it would be great to have these things fixed, before waiting for a big reinventing of the wheel, especially since it seems so close to being a really useful app. But I understand it's your project, and I hope you're enjoying your journey of learning by doing, without letting get people's opinions or requests getting under your skin or in your way. I know it can feel sometimes like getting pressure for sharing your hard work for free. I hope you manage to maintain your passion and motivation, and look forward to see what you will make out of this interesting project.

Have a nice day.

Vali-98 commented 1 week ago

Hey there @inspir3dArt , thanks for the general feedback, I do have some things to share about your thoughts:

The character card limitation

This is one thing I'm aiming to fix with the Character Card Editor refresh. Currently, only description and examples are considered, but some websites use fields such as personality and scenario in context. These fields actually already exist and are stored in the app. It's really just a matter of implementing them to respect the tokenizer system and not break the prompt builder.

For the UI side of things, the current card editor hasn't been changed in a very long time design wise. It's about time it gets a facelift.

The rerolls of the last massage doesn't work.

This is seems like a model issue. What a reroll does is essentially just make the same request but with a different seed value. I'm fairly sure all UI's do this. If possible, perhaps increasing Temperature or using samplers like XTC would improve variability.

It would be nice if it would be possible to link to a model you have on your devices storage

This is something I have actually tried before, but it always ended up crashing. I suppose I could revisit this to find a solution. Personally speaking, it'd be nice to also not copy model files over and over when I'm testing fresh installs.

EDIT: Update, the primary issue with loading models from storage is that Android sandboxes application files for security reasons and disallows reading file paths directly, so this sadly just isn't possible on the JS side, but I'll see if I can squeeze it into the native side.

GeminiX369 commented 5 days ago

I have a small suggestion. Can the software send a part of text to TTS for synthesis when it generates it? In this way, there is no need to wait for a large section of text to be completely synthesized before performing tts speech synthesis all at once. This can significantly improve the user experience when the model output text is of large length. Punctuation such as commas and periods can be used as delimiters between text fragments.

Vali-98 commented 5 days ago

@GeminiX369 I have had this feature request before, however at the moment it will require a somewhat big rework of the TTS system to actually execute, so no promises there.

@inspir3dArt I have tested a possible solution to external model loading, I think this may be possible to ship.

Issues-maker commented 4 days ago

Would you add in your roadmap a new feature - 'set specific time to keep model loaded'? It is already implemented in competitor project Ollama app(https://github.com/JHubi1/ollama-app) if you need the code example for it.

This feature might be very useful for most users of your app because if me, I prefer keeping a model in memory constantly so I can communicate by phone with LLM in just a second.

Telegram also has big community of fans of your app there, feel free to join, there are some questions too https://t.me/chatterui

Vali-98 commented 4 days ago

Would you add in your roadmap a new feature - 'set specific time to keep model loaded'? It is already implemented in competitor project Ollama app(https://github.com/JHubi1/ollama-app) if you need the code example for it.

The link given is a to a UI that connects to ollama. Ollama itself is the model manager here. Though it isn't impossible to create some kind of llm background service for android which loads as needed, it is out of the scope of the project.

Issues-maker commented 4 days ago

Would you add in your roadmap a new feature - 'set specific time to keep model loaded'? It is already implemented in competitor project Ollama app(https://github.com/JHubi1/ollama-app) if you need the code example for it.

The link given is a to a UI that connects to ollama. Ollama itself is the model manager here. Though it isn't impossible to create some kind of llm background service for android which loads as needed, it is out of the scope of the project.

Android? It changes time to keep model loaded on gpu server, at least it is what I use - 70B model loaded on my gpu server. Personally for me, all these attempts to run LLMs inside phone memory is joke. Anyway if Ollama, it needs parameter "keep_alive = 0"(or negative for keep a model in memory permanently), I don't know mobile app programming that's why I gave example in another app for you.

GeminiX369 commented 4 days ago

@GeminiX369 I have had this feature request before, however at the moment it will require a somewhat big rework of the TTS system to actually execute, so no promises there.

@inspir3dArt I have tested a possible solution to external model loading, I think this may be possible to ship.

It's a pity that I don't know react-native, otherwise I would be happy to help. This is not difficult to achieve, you only need to remember the end of the current sentence being read aloud, and the end of the next sentence to be read aloud (that is, the position of the latest punctuation mark). After the current sentence is read, the next sentence to be read is sent. At the same time, when receiving content, the end of the next sentence to be read (the position of the latest punctuation mark) is updated in real time.

GeminiX369 commented 4 days ago

Would you add in your roadmap a new feature - 'set specific time to keep model loaded'? It is already implemented in competitor project Ollama app(https://github.com/JHubi1/ollama-app) if you need the code example for it.

The link given is a to a UI that connects to ollama. Ollama itself is the model manager here. Though it isn't impossible to create some kind of llm background service for android which loads as needed, it is out of the scope of the project.

Android? It changes time to keep model loaded on gpu server, at least it is what I use - 70B model loaded on my gpu server. Personally for me, all these attempts to run LLMs inside phone memory is joke. Anyway if Ollama, it needs parameter "keep_alive = 0"(or negative for keep a model in memory permanently), I don't know mobile app programming that's why I gave example in another app for you.

I think it's meaningful. Although the performance of LLM on the mobile phone is not as good as that on the server, it is still capable of using it to perform some simple text processing tasks. For example, you can use it to summarize text, remove some nonsense, or let it extract some information you want from the text, etc. It would be better if an API could be provided for other applications to use. Although I know it can be run directly in Termux, but that is not very convenient.

Vali-98 commented 4 days ago

Anyway if Ollama, it needs parameter "keep_alive = 0"(or negative for keep a model in memory permanently)

Ah, I see, I thought this was a request for on-device model management. As for that ollama, its just a generation parameter field, it can be done pretty easily based on the docs here: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion

That said, this will likely be a part of the API-generalization rework.

GameOverFlowChart commented 3 days ago
  • A redoing of the app's color system which IMO is too monochrome, trying to utilize proper neutral colors for light/dark modes.

Just make sure to support true dark mode for amoled screens, it saves battery and less active pixels would hopefully help with screen burn issues from happening.