TheBlewish / Automated-AI-Web-Researcher-Ollama

A python program that turns an LLM, running on Ollama, into an automated researcher, which will with a single query determine focus areas to investigate, do websearches and scrape content from various relevant websites and do research for you all on its own! And more, not limited to but including saving the findings for you!
MIT License
2.09k stars 205 forks source link

Does not support Chinese for question retrieval. #39

Open Liu8Can opened 2 days ago

Liu8Can commented 2 days ago

image I want to ask questions in Chinese, but I can only input in English.

Liu8Can commented 2 days ago

I tried to modify the get_multilie_input() method to solve this problem, but the content in research_session is still in English and some content is garbled. I am trying to solve it.

def get_multiline_input() -> str:
    """获取多行输入(按Ctrl+Z提交,支持中文输入)"""
    print(f"{Fore.GREEN}📝 请输入您的消息(按Ctrl+Z提交):{Style.RESET_ALL}")
    lines = []
    current_line = []

    import msvcrt

    try:
        while True:
            if msvcrt.kbhit():
                char = msvcrt.getwch()  # 使用 getwch() 处理宽字符(支持中文)

                # Ctrl+Z检测(Windows EOF)
                if char == '\x1a':  # ASCII code for Ctrl+Z
                    sys.stdout.write('\n')  # 新行清理显示
                    if current_line:
                        lines.append(''.join(current_line))
                    return ' '.join(lines).strip()

                # 处理回车键(Enter)
                elif char in ['\r', '\n']:  
                    sys.stdout.write('\n')
                    lines.append(''.join(current_line))
                    current_line = []

                # 处理退格键(Backspace)
                elif char == '\x08':  
                    if current_line:
                        current_line.pop()
                        sys.stdout.write('\b \b')  # 删除字符

                # 处理Ctrl+C(退出)
                elif char == '\x03':  # Ctrl+C
                    sys.stdout.write('\n')
                    return 'q'

                # 普通字符(支持中文输入)
                elif 32 <= ord(char) <= 126 or 0x4e00 <= ord(char) <= 0x9fff:  # 允许输入中文
                    current_line.append(char)
                    sys.stdout.write(char)

                # 刷新输出
                sys.stdout.flush()

    except Exception as e:
        logger.error(f"多行输入发生错误: {str(e)}")
        return 'q'

image

Liu8Can commented 2 days ago

In short, as long as non-English content appears in the retrieval results, there is a high probability that garbled characters will appear in the session file. There are still some problems with multilingual compatibility at present.