chatchat-space / Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Apache License 2.0
31.18k stars 5.44k forks source link

知识库chunk_size最大值设置 #4877

Closed Smile-L-up closed 2 weeks ago

Smile-L-up commented 2 weeks ago

大佬,我希望知识库文档切割最大字数自定义设置,但是在如下代码中设置不生效?已经查询过其他issue相关解决方案:如链接,但还是不行?请问该如何设置?

        with st.expander(
            "文件处理配置",
            expanded=True,
        ):
            cols = st.columns(3)
            chunk_size = cols[0].number_input("单段文本最大长度:", min_value=1, max_value=6000, value=Settings.kb_settings.CHUNK_SIZE)
            chunk_overlap = cols[1].number_input(
                "相邻文本重合长度:", 0, chunk_size, Settings.kb_settings.OVERLAP_SIZE
            )
            cols[2].write("")
            cols[2].write("")
            zh_title_enhance = cols[2].checkbox("开启中文标题加强", Settings.kb_settings.ZH_TITLE_ENHANCE)

报如下错误

StreamlitAPIException: The default value 5000 must be less than or equal to the max_value 1000
Traceback:
         File "path/to/Langchain-Chatchat/libs/chatchat-server/chatchat/webui.py", line 69, in <module>
    knowledge_base_page(api=api, is_lite=is_lite)
File "path/to/Langchain-Chatchat/libs/chatchat-server/chatchat/webui_pages/knowledge_base/knowledge_base.py", line 182, in knowledge_base_page
    chunk_size = cols[0].number_input("单段文本最大长度:", min_value=1, max_value=6000, value=Settings.kb_settings.CHUNK_SIZE)

从报错提示看,streamlit即使设置了最大值6000,但还是不知道哪里默认设置了最大值1000,期待回复。