dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
1.88k stars 416 forks source link

Cannot Reproduce the Published Benchmarking Result on Jetson AI Lab Webpage #532

Closed ramyadhadidi closed 1 month ago

ramyadhadidi commented 1 month ago

Hello,

I've been trying to reproduce the benchmarking results published on the Jetson AI Lab webpage (here).

First of all, it is unclear which Jetson Orin models are utilized, i.e., AGX: [32GB, 64GB, or Industrial] / Nano: [4GB or 8GB].

Second, I'm having a hard time matching the reported numbers. I get 3-4x lower performance. I followed the MLC guide here with some fixes (see https://github.com/dusty-nv/jetson-containers/issues/529). For instance:

I have an AGX Orin 32GB, so partially these lower numbers might be related to the lower memory and TOPS difference of the two AGX variants (~1.4x). However, I don't think this is the main reason. None of the above models are bottlenecked by the smaller memory size of the AGX 32GB, and the TOPS difference is not significant considering that Attention is a memory-bound operation (memory bandwidth of both models are same).

dusty-nv commented 1 month ago

Hi Ramyad, this was with AGX Orin 64GB in MAX-N power mode. Other folks typically have to change to this power mode (using nvpmodel tool) then they get the same 47 tokens/sec with Llama-2-7B for example. IIRC the AGX 32GB has fewer cores not just memory capacity.


From: Ramyad @.> Sent: Monday, May 20, 2024 2:18:28 PM To: dusty-nv/jetson-containers @.> Cc: Subscribed @.***> Subject: [dusty-nv/jetson-containers] Cannot Reproduce the Published Benchmarking Result on Jetson AI Lab Webpage (Issue #532)

Hello,

I've been trying to reproduce the benchmarking results published on the Jetson AI Lab webpage (herehttps://www.jetson-ai-lab.com/benchmarks.html).

First of all, it is unclear which Jetson Orin models are utilized, i.e., AGX: [32GB, 64GB, or Industrial] / Nano: [4GB or 8GB].

Second, I'm having a hard time matching the reported numbers. I get 3-4x lower performance. I followed the MLC guide herehttps://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/mlc with some fixes (see #529https://github.com/dusty-nv/jetson-containers/issues/529). For instance:

I have an AGX Orin 32GB, so partially these lower numbers might be related to the lower memory and TOPS difference of the two AGX variants (~1.4x). However, I don't think this is the main reason. None of the above models are bottlenecked by the smaller memory size of the AGX 32GB, and the TOPS difference is not significant considering that Attention is a memory-bound operation (memory bandwidth of both models are same).

— Reply to this email directly, view it on GitHubhttps://github.com/dusty-nv/jetson-containers/issues/532, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADVEGK53S2E75QQOPUGUZDLZDI47JAVCNFSM6AAAAABIAFGZJSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMYDMNJRHA2DMNI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

ramyadhadidi commented 1 month ago

Hi Dustin, thanks. You are right, I am on 30W mode. I will close this issue and reopen it if I cannot produce the result that is almost similar to AGX 64GB.