Closed nandeeka closed 1 month ago
This particular issue is because NKI-codegen (currently an experiment feature in NKI) doesn't print engine parameter correctly. We are opening a ticket to track this internally. Will post here once we have a fix. A temporary workaround can be removing engine=0
in the generated NKI file.
If the goal here is to benchmark a Pytorch model, do the tools in Performance and Benchmark Tools
(https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/index.html) help?
This worked. Thank you so much!
I am trying to benchmark NKI code generated from the corresponding PyTorch code. My workflow is as follows:
.pb
file<inputs/outputs>
are the same as the parameters tonki.simulate_kernel
in the automatically generated kernel:I think the first question is, is there a better way to get this fine-grain benchmarking information?
Sometimes (e.g., for matrix multiplication), this flow works great. But other times, I see the error:
Original Pytorch code is:
This creates the following NKI code:
To benchmark this kernel, I add the following to the bottom:
Environment: I started with the Neuron 2.20 DLAMI and installed the Allocation API using the .deb and .whl files @aws-serina-tan sent me.