SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
MIT License
7.89k stars 405 forks source link

How to understand the codes of llama.cpp? #130

Open BHbean opened 8 months ago

BHbean commented 8 months ago

Your PowerInfer is an amazing work to achieve great performance! Inspired by your brilliant ideas, I myself am thinking about development new features based on llama.cpp.

However, it is a bit hard for me to fully understand the structure of llama.cpp. As you guys have the experience of developing PowerInfer, im sincerely asking for your help:

  1. is there any docs or videos suitable for a beginner to understand the whole structure llama.cpp? (even your own understanding would be helpful! )
  2. could you share some tips for development based on llama.cpp?

I would be really grateful if you can give me a helping hand. Thanks in advance!

hodlen commented 8 months ago

Thank you for your interest in PowerInfer and we are more than happy to inspire more people!

The code structure of PowerInfer is consistent with that of llama.cpp, including aspects such as organizing the computation graph, external I/O (in llama.cpp), different operator implementations (ggml.c, ggml-cuda.cu, etc.), specific sub-function implementations (ggml-alloc.c), and high-level applications (under examples/). Therefore, I recommend focusing on understanding the architecture of llama.cpp.

Unfortunately, llama.cpp itself doesn't have extensive documentation, let alone textual or video tutorials. If you are keen to learn, you might find this community discussion helpful. This is similar to how we onboard new collaborators in our team, through collaborative learning and discussions.

BHbean commented 7 months ago

Sorry for the late thanks! Thanks for your comprehensive explanation! I will check the disscusion to learn some helpful knowledge as well!

Huge thanks again! Hope to keep communication and learn from you guys!