Open wuybo opened 3 months ago
方便贴一下命令行终端的log吗?我们看下有没有什么报错
``D:\pytho\Qwen-Agent-main\Scripts\python.exe D:\Backup\Downloads\Qwen-Agent-main\examples\assistant_rag.py Running on local URL: http://127.0.0.1:7861
To create a public link, set share=True
in launch()
.
2024-07-04 16:44:13,588 - simple_doc_parser.py - 324 - INFO - Read parsed C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf from cache.
2024-07-04 16:44:13,588 - doc_parser.py - 114 - INFO - Start chunking C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf (开发者权利声明1.pdf)...
2024-07-04 16:44:13,589 - doc_parser.py - 132 - INFO - Finished chunking C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf (开发者权利声明1.pdf). Time spent: 0.0010001659393310547 seconds.
2024-07-04 16:44:47,523 - utils.py - 69 - ERROR - Traceback (most recent call last):
File "D:\pytho\Qwen-Agent-main\lib\site-packages\qwen_agent\utils\utils.py", line 225, in get_file_type
content = read_text_from_file(path)
File "D:\pytho\Qwen-Agent-main\lib\site-packages\qwen_agent\utils\utils.py", line 186, in read_text_from_file
file_content = file.read()
File "D:\Programs\python3.10\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 47: invalid start byte
2024-07-04 16:44:47,562 - simple_doc_parser.py - 324 - INFO - Read parsed C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf from cache. 2024-07-04 16:44:47,563 - doc_parser.py - 114 - INFO - Start chunking C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf (开发者权利声明1.pdf)... 2024-07-04 16:44:47,563 - doc_parser.py - 132 - INFO - Finished chunking C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf (开发者权利声明1.pdf). Time spent: 0.0 seconds. 2024-07-04 16:46:35,746 - utils.py - 69 - ERROR - Traceback (most recent call last): File "D:\pytho\Qwen-Agent-main\lib\site-packages\qwen_agent\utils\utils.py", line 225, in get_file_type content = read_text_from_file(path) File "D:\pytho\Qwen-Agent-main\lib\site-packages\qwen_agent\utils\utils.py", line 186, in read_text_from_file file_content = file.read() File "D:\Programs\python3.10\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 47: invalid start byte
2024-07-04 16:46:35,748 - utils.py - 69 - ERROR - Traceback (most recent call last): File "D:\pytho\Qwen-Agent-main\lib\site-packages\qwen_agent\utils\utils.py", line 225, in get_file_type content = read_text_from_file(path) File "D:\pytho\Qwen-Agent-main\lib\site-packages\qwen_agent\utils\utils.py", line 186, in read_text_from_file file_content = file.read() File "D:\Programs\python3.10\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte
2024-07-04 16:46:35,787 - simple_doc_parser.py - 324 - INFO - Read parsed C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf from cache. 2024-07-04 16:46:35,787 - doc_parser.py - 114 - INFO - Start chunking C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf (开发者权利声明1.pdf)... 2024-07-04 16:46:35,787 - doc_parser.py - 132 - INFO - Finished chunking C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf (开发者权利声明1.pdf). Time spent: 0.0 seconds. 2024-07-04 16:50:10,261 - utils.py - 69 - ERROR - Traceback (most recent call last): File "D:\pytho\Qwen-Agent-main\lib\site-packages\qwen_agent\utils\utils.py", line 225, in get_file_type content = read_text_from_file(path) File "D:\pytho\Qwen-Agent-main\lib\site-packages\qwen_agent\utils\utils.py", line 186, in read_text_from_file file_content = file.read() File "D:\Programs\python3.10\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 47: invalid start byte
2024-07-04 16:50:10,262 - utils.py - 69 - ERROR - Traceback (most recent call last): File "D:\pytho\Qwen-Agent-main\lib\site-packages\qwen_agent\utils\utils.py", line 225, in get_file_type content = read_text_from_file(path) File "D:\pytho\Qwen-Agent-main\lib\site-packages\qwen_agent\utils\utils.py", line 186, in read_text_from_file file_content = file.read() File "D:\Programs\python3.10\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte
2024-07-04 16:50:15,078 - split_query.py - 82 - INFO - Extracted info from query: {"information": ["https://www.gov.cn/xinwen/2020-06/01/content_5516649.htm
方便贴一下命令行终端的log吗?我们看下有没有什么报错
2024-07-04 17:23:06,002 - simple_doc_parser.py - 326 - INFO - Start parsing C:\Users\Administrator\AppData\Local\Temp\gradio\d8d0bc75266a5fc0dc442eb81b70bbabe1301cde\民法典.pdf... 2024-07-04 17:23:19,483 - simple_doc_parser.py - 365 - INFO - Finished parsing C:\Users\Administrator\AppData\Local\Temp\gradio\d8d0bc75266a5fc0dc442eb81b70bbabe1301cde\民法典.pdf. Time spent: 13.480265617370605 seconds. 2024-07-04 17:23:19,541 - doc_parser.py - 114 - INFO - Start chunking C:\Users\Administrator\AppData\Local\Temp\gradio\d8d0bc75266a5fc0dc442eb81b70bbabe1301cde\民法典.pdf (民法典.pdf)... 2024-07-04 17:23:19,596 - doc_parser.py - 132 - INFO - Finished chunking C:\Users\Administrator\AppData\Local\Temp\gradio\d8d0bc75266a5fc0dc442eb81b70bbabe1301cde\民法典.pdf (民法典.pdf). Time spent: 0.05436825752258301 seconds.
方便贴一下命令行终端的log吗?我们看下有没有什么报错
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 47: invalid start byte
看起来是因为文件不是utf-8编码,可能是windows平台遇到gbk中文文档了。我试下能不能复现&fix这个问题。
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 47: invalid start byte
看起来是因为文件不是utf-8编码,可能是windows平台遇到gbk中文文档了。我试下能不能复现&fix这个问题。 PDF下载地址: 这个PDF 上传没报错 不过也是没读取成功; 对文件的大小有限制吗 https://jkwwt.acftu.org/jkwwtzcfg/202203/P020220325353963962863.pdf
我的widnows机器不知为何无法复现此问题。
但,我还是在main分支增加了对非utf8(比如gbk)文件的处理,感兴趣的话可以试试拉取并安装最新的main分支,看看是否能工作。
相关commit: https://github.com/QwenLM/Qwen-Agent/commit/d9a37753f6dc86bbc33dd316a86b6fd1e4290e5c
还是 assistant_rag.py 案例; 刚开始 我以为是缓存那边文件的影响,C:\Users\Administrator\AppData\Local\Temp\gradio; 我吧该目录下面的文件删除了,重新上传一个小的PDF 文件依然这个情况;不知道其他老师有没有遇到;我是拉去的最新版的Qwen_agent;
日志: `D:\pytho\Qwen-Agent-main\Scripts\python.exe D:\Backup\Downloads\Qwen-Agent-main\examples\assistant_rag.py Running on local URL: http://127.0.0.1:7860
Thanks for being a Gradio user! If you have questions or feedback, please join our Discord server and chat with us: https://discord.gg/feTf9x3ZSB
To create a public link, set share=True
in launch()
.
2024-07-05 16:11:53,558 - simple_doc_parser.py - 324 - INFO - Read parsed C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf from cache.
2024-07-05 16:11:53,558 - doc_parser.py - 114 - INFO - Start chunking C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf (开发者权利声明1.pdf)...
2024-07-05 16:11:53,558 - doc_parser.py - 132 - INFO - Finished chunking C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf (开发者权利声明1.pdf). Time spent: 0.0 seconds.
`
开发者权利声明(1).pdf
我上传的PDF文件:希望可以复现解决该问题;
还是 assistant_rag.py 案例; 刚开始 我以为是缓存那边文件的影响,C:\Users\Administrator\AppData\Local\Temp\gradio; 我吧该目录下面的文件删除了,重新上传一个小的PDF 文件依然这个情况;不知道其他老师有没有遇到;我是拉去的最新版的Qwen_agent;
日志: `D:\pytho\Qwen-Agent-main\Scripts\python.exe D:\Backup\Downloads\Qwen-Agent-main\examples\assistant_rag.py Running on local URL: http://127.0.0.1:7860
Thanks for being a Gradio user! If you have questions or feedback, please join our Discord server and chat with us: https://discord.gg/feTf9x3ZSB
To create a public link, set
share=True
inlaunch()
. 2024-07-05 16:11:53,558 - simple_doc_parser.py - 324 - INFO - Read parsed C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf from cache. 2024-07-05 16:11:53,558 - doc_parser.py - 114 - INFO - Start chunking C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf (开发者权利声明1.pdf)... 2024-07-05 16:11:53,558 - doc_parser.py - 132 - INFO - Finished chunking C:\Users\Administrator\AppData\Local\Temp\gradio\88c59b31ab04a1b2ddb9c941679bf6e787fc093b\开发者权利声明1.pdf (开发者权利声明1.pdf). Time spent: 0.0 seconds. ` 开发者权利声明(1).pdf我上传的PDF文件:希望可以复现解决该问题;
这个似乎是另一个bug(如果用户只上传文件、不用文字发问就会触发)。。我们之前没测试到这种情况(缺少专业的测试)。我正在查为什么
看log截图似乎gbk编码的问题倒是解决了。
还有一个问题老师;每次回答都会引用我上传的全部文件;比如我上传了两个文件,好比我上传了刑法的文件,和民法典的文件,我只需要他根据民法典的内容回答,这种在哪里可以设置下;
还有一个问题老师;每次回答都会引用我上传的全部文件;比如我上传了两个文件,好比我上传了刑法的文件,和民法典的文件,我只需要他根据民法典的内容回答,这种在哪里可以设置下;
这种需要换Agent实现了,思路是在一开始先让llm判断下要读哪个文件(会增加一次llm调用所以没在Assistant里实现)。比如这个例子:https://github.com/QwenLM/Qwen-Agent/blob/main/examples/virtual_memory_qa.py (但是这个不是最高效的实现)
这个似乎是另一个bug(如果用户只上传文件、不用文字发问就会触发)。。我们之前没测试到这种情况(缺少专业的测试)。我正在查为什么
main分支修复了“只传文件不打字时无回答“的bug。