QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Apache License 2.0
12.47k stars 1.01k forks source link

请问可以使用高通的npu进行部署和推理吗? #1240

Closed caramel678 closed 1 month ago

caramel678 commented 2 months ago

如题

jklj077 commented 1 month ago

Hello there! It's understandable that navigating specialized hardware backends can be a complex endeavor, particularly when it comes to customization. Typically, hardware manufacturers like Qualcomm offer dedicated support or provide open-source resources to facilitate integration and development efforts.

To assist you, here are a few avenues to consider:

  1. Qualcomm AI Hub: While not explicitly open-source, Qualcomm does maintain a repository of AI models which could serve as a starting point or offer valuable insights. You may explore it at https://aihub.qualcomm.com/models.

  2. Hugging Face Hub: Qualcomm has an active presence on the Hugging Face Hub, a popular hub for machine learning models. They've promised to share how-to guides for LLMs at https://huggingface.co/qualcomm/Llama-v2-7B-Chat/discussions/1.

  3. Community Efforts: There's also a promising initiative within the llama.cpp community, where developers are working on integrating support for Qualcomm backends. This effort, visible through https://www.github.com/ggerganov/llama.cpp/pull/6869, underscores the potential for compatibility despite the current lack of official open-source solutions.

In summary, while a readily available open-source solution tailored for Qualcomm hardware might not exist at this moment, exploring these resources and engaging with the active communities can pave the way towards achieving your integration goals. Keep in mind that staying updated with these platforms and communities can lead to new developments and potential solutions over time.