andrewnguonly / Lumos

A RAG LLM co-pilot for browsing the web, powered by local LLMs
MIT License
1.32k stars 93 forks source link

Add LangChain `YoutubeLoader` #154

Open andrewnguonly opened 3 months ago

andrewnguonly commented 3 months ago

https://js.langchain.com/docs/integrations/document_loaders/web_loaders/youtube

Draculabo commented 3 months ago

I will finish it in a few days.

Draculabo commented 3 months ago

js.langchain.com/docs/integrations/document_loaders/web_loaders/youtube

https://github.com/langchain-ai/langchainjs/blob/d6e25af137873493d30bdf5732d46b842e421ffa/langchain/src/document_loaders/web/youtube.ts I encountered some issues while developing YoutubeLoader.

  1. Recently, YouTube changed their API interface response fields, causing the original youtubei.js library to become ineffective. Specific problems can be viewed at the following link, which I have fixed according to the guidelines. However, for certain videos, they may not necessarily return subtitles. Github GitHub
  2. When sending fetch requests in Chrome extensions, it will automatically include the current origin and cannot be modified, which may result in our requests being intercepted. You can view the following links: javascript - Overridding XMLHttpRequest Prototype For Chrome Extension - Stack Overflow javascript - Chrome Extension: how to change origin in AJAX request header? - Stack Overflow Perhaps we can try not using YoutubeLoader and instead use a browser search engine API such as Serper API (Serper - The World's Fastest and Cheapest Google Search API)
andrewnguonly commented 3 months ago

Let's pause this feature for now. It looks like the issue from item 1 was also reported in LangChainJS's repo: https://github.com/langchain-ai/langchainjs/issues/4994. Maybe we can push a fix to LangChainJS. It looks like several other people have implemented workarounds/solutions.

Regarding item 2, were you running the document loader from the background script or from the extension popup?

Draculabo commented 3 months ago

Let's pause this feature for now. It looks like the issue from item 1 was also reported in LangChainJS's repo: langchain-ai/langchainjs#4994. Maybe we can push a fix to LangChainJS. It looks like several other people have implemented workarounds/solutions.让我们暂时暂停此功能。看起来第 1 项中的问题也在 LangChainJS 的存储库中报告: langchain-ai/langchainjs#4994 .也许我们可以向LangChainJS推送修复程序。看起来其他几个人已经实施了变通方法/解决方案。

Regarding item 2, were you running the document loader from the background script or from the extension popup?关于第 2 项,您是从后台脚本还是从扩展弹出窗口运行文档加载器?

I run the document loader from the extension popup.