Add output language - Githubissues

ItzCrazyKns commented 3 months ago

          @ItzCrazyKns I think this request is a bit different from #72. That issue is about system prompts for various tasks, but the preferred output language is a separate thing. Adding a setting for this would be super user-friendly and helpful, as we could support many common languages directly.

Thanks for considering it!

Originally posted by @PeterDaveHello in https://github.com/ItzCrazyKns/Perplexica/issues/296#issuecomment-2282056439

ItzCrazyKns commented 3 months ago

The issue is, some models (actually most of the models) don't support all languages now if we add output language preference people would open more issues that it is not replying etc..

MeinDeutschkurs commented 3 months ago

Call it "preferred output language (the model must support it):", followed by an empty text box, where users can input whatever. And if a user defined value is set (for instance 'German'), it could lead to a prompt-addition at the final output prompt like '\n\nPlease write your text in German.'. An empty value could simply lead to not adding the addition.

PeterDaveHello commented 3 months ago

While supporting every single language might not be feasible, providing support for the most commonly used languages would still substantially improve the current situation where only English is supported. Even if we can't cover all languages, enabling users to interact in a few major languages would significantly enhance the user experience.

Zirgite commented 3 months ago

Meanwhile, just add in the prompt the output language and the model should respond if the model is able to. I mean not in the system prompt but just when asking perplexica. Some people would ask to custom prompt something else without touching the code. So it can be implemented as in lm studio as empty window in the settings where you can add custom additional prompts the language being one of them.

MeinDeutschkurs commented 3 months ago

Meanwhile, just add in the prompt the output language and the model should respond if the model is able to. I mean not in the system prompt but just when asking perplexica.

Some people would ask to custom prompt something else without touching the code. So it can be implemented as in lm studio as empty window in the settings where you can add custom additional prompts the language being one of them.

No, you can't. The research is influenced by the written and requested language. Finally the search results are dependent on a special language. That's a sensitive thing. And yes, you could edit the system prompt for the final output manually, but this is hacky and has to be re-done after each update.

ItzCrazyKns commented 3 months ago

I need to take a little bit of time to benchmark the results. The language can affect the quality of the actual output and the search results.

MeinDeutschkurs commented 3 months ago

I need to take a little bit of time to benchmark the results. The language can affect the quality of the actual output and the search results.

And that's why it is important to affect the final output only.

Alternatively maybe some kind of translation process after the final output could be more practicable in case of perplexica. This would add an optional further step at the final end.

Prompting is really sophisticated. The simultaneous step could break the rest. Maybe the additional step "Translate the following markdown to German: ‘’’{final_reply}’’’" could be better, if activated.

PeterDaveHello commented 3 months ago

Perhaps providing users with two separate options—one for the search language and another for the output language—could be quite helpful. This might allow for more precise search results and flexible output, enhancing the overall user experience while also reducing the need for manual translation.

RazeBerry commented 3 months ago

I think the biggest problem would not be the SearxNG part nor the Claude 3.5 Sonnet part but the small embedding model which is specialized for English language and probably collapses under other smaller languages such as Albanian. Maybe a long term goal of this project is to either locally run more powerful embedding models or have online embedding API as a part of the pipeline.

MeinDeutschkurs commented 3 months ago

Perhaps providing users with two separate options—one for the search language and another for the output language—could be quite helpful. This might allow for more precise search results and flexible output, enhancing the overall user experience while also reducing the need for manual translation.

Indeed. But I think this is more complex:

input (situation) language
research content/document language(s)
language to save/cache info
output/display (summary) language

It seems to be unnecessary if you think in classical (google) search terms, but imagine to ask something like "Georgian, Armenian, English and German. Find documents/discussions/news articles about unemployment for each language between 1990 and 2000, 2000 and 2010, as well as 2010 and 2020. Output a table, describing the main discourses/discussions as well as a description of the major tonus/mindset by time span and language. And please describe the differences between all of this afterwards."

I hope that my example was easy to follow. I'd felt more comfortable to write this in German.

papiche commented 2 months ago

I succeed to make open-webui always answer in french by adding in command prompt. "Tu es un assistant qui maîtrise de nombreuses langues, mais tu effectueras ta réponse finale toujours en français"

may be this could be a dynamic adapted to web "browser language" or App language parameter ?

MeinDeutschkurs commented 2 months ago

I succeed to make open-webui always answer in french by adding in command prompt.

"Tu es un assistant qui maîtrise de nombreuses langues, mais tu effectueras ta réponse finale toujours en français"

may be this could be a dynamic adapted to web "browser language" or App language parameter ?

I believe that this is more complex, to get proper results.

ItzCrazyKns / Perplexica

Add output language #315