Does this work with LLama 3?

HKUNLP / ChunkLlama

[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"

Apache License 2.0

295 stars 14 forks source link

Does this work with LLama 3? #10

Closed thomasgauthier closed 2 months ago

thomasgauthier commented 3 months ago

Question in the title.

Big thanks for making this accessible to the community!

ChenxinAn-fdu commented 3 months ago

Thank you for bringing up this issue! We are actively testing DCA with Llama3 8B/70B. The results will be updated next week!

ChenxinAn-fdu commented 2 months ago

Hi! The language modeling results for Llama3 8b/70b have been updated! DCA is still very effective! ChunkLlama3 shows an obvious performance gain over ChunkLlama2 in PPL.

Updates of needle in a haystack, few-shot, and zero-shot results are expected in two days. These are running a bit slow.

ChenxinAn-fdu commented 2 months ago

All results of ChunkLlama3 8b/70b has been updated. (see the results)

Generally, ChunkLlama3-8b achieves 100% retrieval accuracy across all document depths. Results on real-world tasks show that ChunkLlama3-70b achieves performance on par with GPT-4 (2023/06/13) and Llama2 Long 70b!