输出中result和overall为空，且显示如下报错信息

IAAR-Shanghai / CRUD_RAG

CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

233 stars 20 forks source link

输出中result和overall为空，且显示如下报错信息 #16

Closed zhaoruchen closed 2 months ago

zhaoruchen commented 3 months ago

我的执行代码如下： python quick_start.py \ --model_name 'glm4' \ --temperature 0.0 \ --max_new_tokens 1280 \ --data_path 'data/crud_split/split_merged.json' \ --shuffle True \ --embedding_dim 1024 \ --docs_path 'data/80000_docs' \ --docs_type 'txt' \ --chunk_size 128 \ --chunk_overlap 0 \ --retriever_name 'base' \ --collection_name 'glm4_docs_80k_chuncksize_128_0' \ --retrieve_top_k 8 \ --task 'continuing_writing' \ --num_threads 20 \ --show_progress_bar True \ --construct_index 并在src/configs/config.py中添加ChatGLM4_local_path。过程显示如下报错信息：

且结果为空：

请问这是什么原因呢？

haruhi-sudo commented 3 months ago

你好，看起来是检索器没有检索出相应的内容，可能的原因很多。建议注释了源代码中的try，except语句，在ide里打断点调试

zhaoruchen commented 3 months ago

您好，在我调试的过程中，显示如下报错信息：

并且我在src.embeddings.base.py的HuggingfaceEmbeddings中，的确没有找到get_agg_embedding_from_queries。请问您有这部分的内容吗？非常感谢！

在 2024-07-17 10:54:43，"Yuanjie Lyu" @.***> 写道：

你好，看起来是检索器没有检索出相应的内容，可能的原因很多。建议注释了源代码中的try，except语句，在ide里打断点调试

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

haruhi-sudo commented 3 months ago

我没有遇见过类似的报错。请问你已经建立好检索数据库了么。看上去你使用的是bge large模型

zhaoruchen commented 3 months ago

我直接将data/80000_docs作为检索文档库。

在 2024-07-17 11:53:03，"Yuanjie Lyu" @.***> 写道：

我没有遇见过类似的报错。请问你已经建立好检索数据库了么

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

haruhi-sudo commented 3 months ago

我没遇见过这样的报错。建议先在小数据集上（比如只有10条测试数据，检索文档库里只有一个文件几百条文档），把demo跑通后再运行之前的代码。

目前看来，向量数据库可能没有正确运行。或者文本并没有被正确地转化为embedding

zhaoruchen commented 3 months ago

好的，感谢回复！我尝试一下～

在 2024-07-17 14:48:46，"Yuanjie Lyu" @.***> 写道：

我没遇见过这样的报错。建议先在小数据集上（比如只有10条测试数据，检索文档库里只有一个文件几百条文档），把demo跑通后再运行之前的代码。

目前看来，向量数据库可能没有正确运行。或者文本并没有被正确地转化为embedding

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>