[Feature] 整合类似chatpdf的功能

wwy94621 commented 1 year ago

能不能整合类似chatpdf的功能？比如新建聊天时可以选中文件，然后基于文件开始聊天。 OpenAI的GitHub上有基本的实现，但是UI太差了。

Yidadaa commented 1 year ago

这种功能需要建立向量索引，在纯前端比较难搞，你看到的开源项目都是把文件传到服务器做处理。

不过我在研究怎么在浏览器里跑 embedding 模型，之后会加此功能。

wwy94621 commented 1 year ago

那个项目是用纯next.js做的，没有连矢量数据库。自己部署肯定是没问题的，难道Vercel不行吗？我还以为可以比较容易的整合进来呢！

yaoleifly commented 1 year ago

这个功能实现就太感谢开发者了

Yidadaa commented 1 year ago

@wwy94621 那个项目用的 openai 的接口做 embedding，比较费钱

https://github.com/openai/openai-cookbook/blob/297c53430cad2d05ba763ab9dca64309cb5091e9/apps/file-q-and-a/nextjs/src/services/openai.ts#L135

wwy94621 commented 1 year ago

嗯嗯！确实，不过反正是用户提供自己的Key嘛！这样可以比较快的来实现，多谢大神！

jun0315 commented 1 year ago

https://github.com/talkingwallace/ChatGPT-Paper-Reader https://github.com/binary-husky/chatgpt_academic https://github.com/mukulpatnaik/researchgpt 一些类似可以借鉴的项目

reonokiy commented 1 year ago

LangChain 有 Python 和 JS 的版本，可以和很多信息源集成 https://github.com/hwchase17/langchain https://github.com/hwchase17/langchainjs 可以在Browser运行 https://js.langchain.com/docs/getting-started/install#browser

Yidadaa commented 1 year ago

@sperjar 不会使用 langchain，这个库过于重了。

我重申一遍，这个功能实现起来并不难，只需要解析 pdf 内容，然后调 openai 接口进行向量化，然后再去做检索就行了。

这个功能现在没做的原因是优先级比较低，我正筹备 2.0 版本的开发，v2.0 的重磅功能是预设角色，chatpdf 的功能会归到外挂知识库的需求里去做，可能是 v2.5，也可能是 v3.0，可以确认的是近期不会实现该功能。

RudRho commented 1 year ago

@sperjar 不会使用 langchain，这个库过于重了。

我重申一遍，这个功能实现起来并不难，只需要解析 pdf 内容，然后调 openai 接口进行向量化，然后再去做检索就行了。

这个功能现在没做的原因是优先级比较低，我正筹备 2.0 版本的开发，v2.0 的重磅功能是预设角色，chatpdf 的功能会归到外挂知识库的需求里去做，可能是 v2.5，也可能是 v3.0，可以确认的是近期不会实现该功能。

感谢大神，@Yidadaa，一些想法：

llama-index 13k stars 可以看一下，比langchain 轻很多，支持本地化 embedding.
「重磅功能是预设角色」现在各家都在做角色预设，但是我看不清角色预设之后的好处，能不能简单讲一下想法。感谢！

Yidadaa commented 1 year ago

预设角色的用处： #138

https://www.allabtai.com/prompt-engineering-tips-zero-one-and-few-shot-prompting/

别人的预设角色只不过是预设一个 prompt，你可以列几个竞品，应该功能都没我的好。

JiangYain commented 1 year ago

@Yidadaa 老师您好，以下为羊驼索引Llamindex的参考链接请您参阅：https://gpt-index.readthedocs.io/en/latest/index.html；我目前已经尝试使用Llamaindex0.6.9构建了一个侧边栏插件（不过只能在谷歌114Beta上运行side panel，且基于本地）和您的项目（最重要的是mask功能）一起配合使用，由于羊驼索引有太多的index方式，比如关键词、树索引、向量索引等等，且目前index还可以进行嵌套等等，除了index也有很多需要深度开发的部分，所以在我认为这个项目目前如果只是使用会很简单，但是想要使用的好会很有难度，我支持你的想法：即“chatpdf 的功能会归到外挂知识库的需求里去做，可能是 v2.5，也可能是 v3.0，可以确认的是近期不会实现该功能”，这个项目现在很活跃基本几天就是一个更新，在给Llamaindex一点时间，让子弹飞一会

希望以上链接能给你一些帮助，至于有人偷盗公众号文章私自转载这件事，希望老师您不要放在心上，如果需要额外的经济支持我愿意尽一些微薄之力！祝你开心

pptt121212 commented 1 year ago

这种功能需要建立向量索引，在纯前端比较困难的情况下，你看到的源项目都是把文件传到服务器做处理。

不过我在研究怎么在浏览器里跑嵌入模型，之后会增加这个功能。

PDF文本总结应该是将PDF分段总结后在内存里临时存放，最后输出最终总结结果。和向量检索PDF里的段落应该是两个方向的方案。

Yidadaa commented 1 year ago

技术选型：

Yidadaa commented 1 year ago

此功能将于 v2.9 版本加入。

daiaji commented 1 year ago

这种功能需要建立向量索引，在纯前端比较难搞，你看到的开源项目都是把文件传到服务器做处理。

不过我在研究怎么在浏览器里跑 embedding 模型，之后会加此功能。

用js跑embedding模型？纯前端听起来很酷。

daiaji commented 1 year ago

此功能将于 v2.9 版本加入。

结合现有的历史摘要功能，是否可以实现把每一个生成的历史摘要向量化到向量数据库里，然后实现GPT对于整个事件的长期记忆，而不是只局限于上下文和近期的历史摘要？

Issues-translate-bot commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically.

This function will be added in v2.9 version.

Combined with the existing historical summary function, is it possible to vectorize each generated historical summary into a vector database, and then realize GPT's long-term memory for the entire event, instead of being limited to context and recent historical summaries?

daiaji commented 1 year ago

此功能将于 v2.9 版本加入。

结合现有的历史摘要功能，是否可以实现把每一个生成的历史摘要向量化到向量数据库里，然后实现GPT对于整个事件的长期记忆，而不是只局限于上下文和近期的历史摘要？

从这个实践的最后示例来看，似乎是可行的。 https://github.com/TomLBZ/koishi-plugin-openai

Issues-translate-bot commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically.

This function will be added in v2.9 version.

Combined with the existing historical summary function, is it possible to vectorize each generated historical summary into a vector database, and then realize GPT's long-term memory for the entire event, instead of being limited to context and recent historical summaries?

From this last example of practice, the market works. https://github.com/TomLBZ/koishi-plugin-openai

alanwu4321 commented 1 year ago

bump

vual commented 1 year ago

@Yidadaa 老师您好，以下为羊驼索引Llamindex的参考链接请您参阅：https://gpt-index.readthedocs.io/en/latest/index.html；我目前已经尝试使用Llamaindex0.6.9构建了一个侧边栏插件（不过只能在谷歌114Beta上运行side panel，且基于本地）和您的项目（最重要的是mask功能）一起配合使用，由于羊驼索引有太多的index方式，比如关键词、树索引、向量索引等等，且目前index还可以进行嵌套等等，除了index也有很多需要深度开发的部分，所以在我认为这个项目目前如果只是使用会很简单，但是想要使用的好会很有难度，我支持你的想法：即“chatpdf 的功能会归到外挂知识库的需求里去做，可能是 v2.5，也可能是 v3.0，可以确认的是近期不会实现该功能”，这个项目现在很活跃基本几天就是一个更新，在给Llamaindex一点时间，让子弹飞一会

希望以上链接能给你一些帮助，至于有人偷盗公众号文章私自转载这件事，希望老师您不要放在心上，如果需要额外的经济支持我愿意尽一些微薄之力！祝你开心

你搞得插件，有演示地址吗？可以看看效果吗？

Issues-translate-bot commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically.

Hello teacher @Yidadaa, please refer to the following reference link for the alpaca index Llamindex: [https://gpt-index.readthedocs.io/en/latest/index.html; I have tried to use Llamindex0.6.9 to build Added a sidebar plugin (but only works on Google 114Beta side](https://gpt-index.readthedocs.io/en/latest/index.html%EF%BC%9B%E6%88%91% E7%9B%AE%E5%89%8D%E5%B7%B2%E7%BB%8F%E5%B0%9D%E8%AF%95%E4%BD%BF%E7%94%A8Llamaindex0.6.9% E6%9E%84%E5%BB%BA%E4%BA%86%E4%B8%80%E4%B8%AA%E4%BE%A7%E8%BE%B9%E6%A0%8F%E6% 8F%92%E4%BB%B6%EF%BC%88%E4%B8%8D%E8%BF%87%E5%8F%AA%E8%83%BD%E5%9C%A8%E8%B0% B7%E6%AD%8C114Beta%E4%B8%8A%E8%BF%90%E8%A1%8Cside) panel, and based on local) and your project (the most important is the mask function), because the sheep Camel index has too many index methods, such as keywords, tree index, vector index, etc., and currently index can also be nested, etc. In addition to index, there are many parts that need in-depth development, so in my opinion, if this project is currently It’s very simple to use, but it’s very difficult to use it well. I support your idea: “The function of chatpdf will be included in the requirements of the plug-in knowledge base, which may be v2.5 or v3 .0, it can be confirmed that this function will not be implemented in the near future", this project is very active now, basically an update for a few days, give Llamaindex a little time, let the bullets fly for a while

I hope the above link can give you some help. As for the fact that someone stole the article from the official account and reposted it privately, I hope you don’t take it to heart. If you need additional financial support, I am willing to do some modest efforts! wish you happy

You made the plug-in, do you have a demo address? Can I see the effect?

johnfelipe commented 1 year ago

i want to know if in roadmap upload or link pdf editable files?

maristeslk commented 11 months ago

后续会支持接入 azure embedding模型吗？

Issues-translate-bot commented 11 months ago

Bot detected the issue body's language is not English, translate it automatically.

Will it support access to the azure embedding model in the future?

iccyuan commented 11 months ago

看到有个网站依赖 https://qdrant.tech 实现

Issues-translate-bot commented 11 months ago

Bot detected the issue body's language is not English, translate it automatically.

I saw that there is a website that relies on to achieve

whl1207 commented 10 months ago

纯前端做，我用了nlp.js匹配问题和知识库的相关性，然后给到提示词里面，但参数有些难调

Issues-translate-bot commented 10 months ago

Bot detected the issue body's language is not English, translate it automatically.

Purely front-end, I used nlp.js to match the correlation between the problem and the knowledge base, and then gave it to the prompt word, but the parameters are a bit difficult to adjust

johnfelipe commented 10 months ago

Witch % is complete this feature?

tatakof commented 1 week ago

any updates on this?

pd: superb project!

ChatGPTNextWeb / ChatGPT-Next-Web

[Feature] 整合类似chatpdf的功能 #960