HuggingFace Open Source Models Integration

assafelovic / gpt-researcher

GPT based autonomous agent that does online comprehensive research on any given topic

https://gptr.dev

MIT License

13.02k stars 1.61k forks source link

HuggingFace Open Source Models Integration #510

Open Huertas97 opened 1 month ago

Huertas97 commented 1 month ago

Hi!

I am using gpt-researcher with date 18/05/2024.

Despite the documentation state that LangChain Adapters can be used, the only LLM provider that gpt-researcher seems to manage are Azure, OpenAI and Google (llm_provider folder for example show this but it is present in all the code).

How can be HuggingFace models used as LLMs and embeddings provider? IS there any intention to integrate this? Is there any alternative to use gpt-reseaercher with HuggingFace models?

Thank you in advance?

assafelovic commented 1 month ago

Hey @Huertas97 thanks for raising this. We're literally working on supporting LLMs. Stay tuned and follow the Discord channel for updates!

nafisalawalidris commented 1 month ago

@Huertas97 I will like to suggest you need to extend the functionality by creating a new provider class that integrates HuggingFace's transformers and sentence-transformers libraries. Implement a HuggingFaceProvider for language generation and a HuggingFaceEmbeddingsProvider for embeddings. Update the configuration and initialisation code in GPT-Researcher to recognise these new providers, allowing you to specify HuggingFace models in the configuration file. Although GPT-Researcher currently supports only Azure, OpenAI and Google, adding HuggingFace support involves modifying the configuration files and extending the initialisation logic. This can be a valuable contribution to the project, enhancing its flexibility and making it more versatile for different NLP tasks. If there is no official support or plans for this integration, you might consider contributing these changes back to the project or requesting this feature from the maintainers.

assafelovic commented 1 month ago

Definitely agree and looking forward for contributions. We can follow something like this: https://github.com/langchain-ai/langchain/pull/22039/files

crti commented 3 weeks ago

I did try GPT Researcher today. As I have no subscription to a LLM and also no powerful PC with GPU I tried huggingface with editing this in config.py: self.llm_provider = os.getenv('LLM_PROVIDER', "huggingface") self.fast_llm_model = os.getenv('FAST_LLM_MODEL', "Qwen2-7B-Instruct") self.smart_llm_model = os.getenv('SMART_LLM_MODEL', "Qwen2-72B-Instruct") I got this error raise validation_error pydantic.v1.error_wrappers.ValidationError: 1 validation error for ChatHuggingFace __root__ Expected llm to be one of HuggingFaceTextGenInference, HuggingFaceEndpoint, HuggingFaceHub, HuggingFacePipeline received <class 'NoneType'> (type=type_error)

Can I fix this with additional editing in config.py or is this a requirements issue?