HKUNLP / ChunkLlama

[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
Apache License 2.0
341 stars 18 forks source link

Tanks for your Research, Can you provide a CUDA version code? #15

Closed nanmi closed 1 week ago

nanmi commented 4 months ago

Exciting work, I am very interested, but since my coding ability is weak, can you provide a CUDA code about DCA, it will be greatly appreciated

nanmi commented 4 months ago

Or, do you have reference codes for other repositories that you recommend that apply DCA technology? Can you share them?

ChenxinAn-fdu commented 4 months ago

Hi, thank you for your attention! DCA can be used for almost all LLMs released on Hugging Face. If you find it challenging for a specific model, please feel free to open an issue.

For CUDA optimization, we are actively working on this, but the code in this repo should not have obvious GPU memory or inference time issues compared with the original inference code.

nanmi commented 4 months ago

您好,感谢您的关注! DCA几乎可以用于Hugging Face上发布的所有LLM。如果您发现特定模型具有挑战性,请随时提出问题。

对于 CUDA 优化,我们正在积极致力于此,但与原始推理代码相比,此存储库中的代码不应存在明显的 GPU 内存或推理时间问题。

Thanks for replying to the message. I want to apply DCA, a super useful technology, under the CUDA framework. When do you expect to open source the work related to the implementation of cuda?

ChenxinAn-fdu commented 4 months ago

I do not have strong background in CUDA programming, so it might take a long time...