Closed Shulin-Zhang closed 12 months ago
Text2vector was used in the initial version
请问检索增强部分,问题一: 检索的top-k是top多少?(k值)。问题二: 检索增强部分的图片(fig. 3)写的检索内容(knowledge base)会有(Statute、Case、literature三种)请问这部分数据会公开么?case怎么检索呢?问题三: 请问disc-lawllm的sft训练和Retrieval Augmentation是分两部分训练的么?那么sft训练时候时候会使用DISC-Law-SFT-Triplet数据么?还是只是使用DISC-Law-SFT-Pair? 谢谢
@SUSTechIR Here are the answers to your questions
@Charlie-XIAO 很高兴非常迅速收到您的回复,为了表达清楚我的观点,我使用中文提问,希望您能谅解。我的第二个问题中,关于Retrieval Augmentation这部分仍有一些不懂,我看您的技术报告提到了使用langchain的框架?那您的意思是在这个框架里指定text2vector作为encoder么?然后这部分参数随着整个training过程更新么?第三个问题也是这样的,因为我看数据DISC-Law-SFT-Pair是没有reference的对吧,那么DISC-Law-SFT-Pair是没法参与到Retrieval Augmentation的是吧?那么是先用DISC-Law-SFT-Pair and DISC-Law-SFT-Triplet数据做了sft训练,然后又单独做了Retrieval Augmentation的训练是么?那这部分的retrieval augment用的还是DISC-Law-SFT-Triplet数据么?然后还有个问题四:我看您技术文档中提到的Subjective Perspective(Figure 4)用的是chatgpt4, 文章其他部分提到是gpt3.5?这部分是笔误么?谢谢您的回复
@SUSTechIR It's Okay to ask in Chinese, and I'm replying in English for consistency with other issues.
@Charlie-XIAO ok, 明白了,非常感谢您的解答。
Hello, why is the model tested in the experiment without the retrieval augmentation, is there any benchmark test conducted with the retrieval augmentation?
The retrieval module is only an experimental feature for now. We will continuously improve it and expand its database, so no formal evaluation is done with it yet.
@yueshengbin who may know this better
btw,顺便问下,那你们有测试w/o retrieval argument的disc-lawllm的实验结果么?可以公布?以及可以分享一下测试数据么?感谢!
@SUSTechIR It is only an experimental feature for now, so we will not release evaluation for this. However, there is indeed improvement that can be directly observed.
Closing as completed, due to long period of inactivity.
请问检索具体是如何实现的,用的什么embedding模型?