Closed Louym closed 1 month ago
Hi @Louym, firstly I'm glad that you were able to get MLC running and benchmarked. I haven't used nsys inside container before and am not sure if there is special setup or tools needed installed inside the container. You might just want to start by profiling a simple CUDA app in a simple CUDA container if you haven't already. Otherwise, you can find the wheels from here (which you can install outside container): http://jetson.webredirect.org/jp5/cu114
Hi @Louym, firstly I'm glad that you were able to get MLC running and benchmarked. I haven't used nsys inside container before and am not sure if there is special setup or tools needed installed inside the container. You might just want to start by profiling a simple CUDA app in a simple CUDA container if you haven't already. Otherwise, you can find the wheels from here (which you can install outside container): http://jetson.webredirect.org/jp5/cu114
Thank you very much for your prompt response. I will try this website later, and I will get back to you if I have any issues.
Hello! @dusty-nv,
I've installed TVM on my server and successfully run the Resnet-50 examples from the TVM tutorial. However, I'm encountering an issue when trying to run from tvm.runtime import disco
as required by MLC. The following check failure occurs:
Even with this line commented out, another error occurs at 'from . import base':
Even if I comments all these lines out, I meet more issues when using ChatModule of mlc.
I'm seeking guidance on the correct installation process for mlc_chat
or mlc_llm
, as I directly used pip to install the wheels. Could you please advise?
Thank you!
Hmm I haven't done this outside of container, but it would seem that the MLC version of the wheel does not correspond to the TVM version of the wheel. In the containers I lock the package versions to make sure the matching ones get installed, which you can see those versions here in its config.py:
In general, I would try and replicate the install just how I have done it in the container. It looks like you are using conda and I don't think that should impact it but not sure?
I tried again and find that I can use 'nsys profile -stats true' to see the summary of GPU kernels while the timeline is still not visible. But I also meet some issues when building tvm-unity(or relax) from source, although I have:
set(USE_CUDA ON)
set(USE_FLASHINFER ON)
set(FLASHINFER_CUDA_ARCHITECTURES 87)
set(CMAKE_CUDA_ARCHITECTURES 87)
It would be great if you could help me build tvm outside container.
I manage to see the nsys timeline results by compiling models in containers and then benchmarking it in the environment built with the website http://jetson.webredirect.org/jp6/cu122 using jp 6 and cuda 12.2 out of containers.
Awesome @Louym , glad you managed to get it working 👍
I'm using a pre-built MLC container on my ARM64 (aarch64) platform with an Orin GPU. I've successfully benchmarked the speed of LLM as per the provided documentation. However, I'm facing an issue where I can only trace CPU activities when attempting to analyze with nsys. Since GPU information is crucial for our analysis, I need to resolve this. How can I configure nsys within the container to include GPU tracing?![Screenshot 2024-05-17 212508](https://github.com/dusty-nv/jetson-containers/assets/108724712/fe231760-5d7e-4fcd-b926-bbc265793073)