chatchat-space / Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Apache License 2.0
30.14k stars 5.28k forks source link

匹配结果中的文件直接在线预览 #3220

Closed qianma819 closed 2 months ago

qianma819 commented 4 months ago


dosubot[bot] commented 4 months ago

To configure the Langchain-Chatchat application to preview PDF files in the browser directly from the match results instead of downloading them, you need to integrate a PDF viewer component into your Streamlit application. The current code snippet you provided shows functionality for downloading selected documents but does not include a PDF viewer integration.

Here's a way to integrate PDF.js, a popular PDF viewer, into your Streamlit application for inline PDF previews:

  1. Download and Host PDF.js: First, you need to download PDF.js from its GitHub repository and host it on a server accessible by your Streamlit application. This is necessary because you'll need to reference the PDF.js viewer's HTML file in an iframe.

  2. Modify the knowledge_base_page Function: In the section where you handle the selection of documents and provide a download button, you can add an additional step to display the selected PDF file in an iframe using PDF.js. You'll use Streamlit's components.v1.html function to achieve this.

Here's an example modification you can make to the knowledge_base_page function to include an inline PDF viewer:

from streamlit.components.v1 import html

# Assuming you've hosted PDF.js at ''

def knowledge_base_page(api: ApiRequest, is_lite: bool = None):
    # Existing code...

    cols = st.columns(4)
    file_name, file_path = file_exists(kb, selected_rows)
    if file_path:
        # Existing download button code...

        # New: Embed PDF in an iframe using PDF.js
        pdf_viewer_url = f"{PDFJS_VIEWER_URL}?file={file_path}"
        iframe_code = f'<iframe src="{pdf_viewer_url}" width="100%" height="600"></iframe>'
        st.markdown(iframe_code, unsafe_allow_html=True)

    # Rest of the existing code...

Important Considerations:

This solution provides a basic integration for inline PDF previews. Depending on your specific requirements and setup, you may need to adjust the implementation details.


#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
HuntZhaozq commented 4 months ago

请问除了这个方法还有别的设置可以进行问答的出处文件点击后在浏览器内预览吗? @dosu-bot

qianma819 commented 4 months ago

if os.path.exists(kb_file.filepath): print("ppppppppppppppppp") print(kb_file.filename) response = FileResponse( path=kb_file.filepath, media_type="application/pdf", filename=kb_file.filename,


        return response


HuntZhaozq commented 4 months ago

@qianma819 会出现error: latin-1 codec can't encode characters in posistion 16-24: ordinal not in range(256),还有请问这个设置能预览doc吗

qianma819 commented 4 months ago


HuntZhaozq commented 4 months ago

修改后,有的pdf可以直接预览了,但是有的pdf还是会直接下载,请问你有遇到这个情况吗?应该如何解决? @qianma819

qianma819 commented 4 months ago

这个情况没遇到,遇到的是 { "code": 500, "msg": "【ポスコン】KnowHow共有内容一覧11.pdf 读取文件失败,错误信息是:'latin-1' codec can't encode characters in position 16-21: ordinal not in range(256)", "data": null }这样的错误。但是直接访问原文件却没有问题。蛋疼

HuntZhaozq commented 4 months ago


qianma819 commented 4 months ago


HuntZhaozq commented 4 months ago


    preview = True
    if preview:
        # Determine media type and content disposition based on file extension
        file_extension = os.path.splitext(file_name)[1].lower()
        media_type = None
        if file_extension == '.pdf':
            media_type = 'application/pdf'
            content_disposition_type = 'inline'
        elif file_extension in ['.doc', '.docx']:
            # For DOC files, you can choose either of the following media types:
            # media_type = 'application/msword'
            media_type = 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
            content_disposition_type = 'inline'
            # For other file types, force download
            content_disposition_type = None
        content_disposition_type = None

        kb_file = KnowledgeFile(filename=file_name,

        if os.path.exists(kb_file.filepath):
            headers = {
                "Content-Type": media_type,
                "Accept-Ranges": "none",  # Disable range requests

            return FileResponse(
qianma819 commented 4 months ago

@HuntZhaozq 这样改确实没有那个问题了

HuntZhaozq commented 4 months ago


qianma819 commented 4 months ago


HuntZhaozq commented 4 months ago


qianma819 commented 4 months ago


HuntZhaozq commented 4 months ago


qianma819 commented 4 months ago


HuntZhaozq commented 4 months ago


qianma819 commented 4 months ago

DEFAULT_BIND_HOST 修改成实际ip,就不用127.0.0.1访问了

qianma819 commented 4 months ago

@HuntZhaozq if file_extension == '.txt': media_type = 'text/plain' content_disposition_type = 'inline' txt文件预览中文乱码,你知道怎样修改不

HuntZhaozq commented 4 months ago


qianma819 commented 4 months ago


qianma819 commented 4 months ago

elif file_extension in ['.doc', '.docx']:
if file_extension == '.doc':
media_type = 'application/msword'
media_type = 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'

Lj4040 commented 2 weeks ago
