the model inference speed

BobMcDear / attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

MIT License

483 stars 22 forks source link

the model inference speed #4

Closed LHW-CLOUD closed 3 months ago

LHW-CLOUD commented 3 months ago

Why does the model inference speed become very slow after converting the model operator into the Triton form。I don't know if it's due to caching

BobMcDear commented 3 months ago

Hello,

Could you please share a small reproducible example so I can diagnose the issues? Quite often, users incorrectly take into account the compilation time when benchmarking their code, but there are indeed edge cases where Triton is slower than PyTorch.

LHW-CLOUD commented 3 months ago

Thanks a lot! I seem to have noticed a previous issue where there was a logical error in my code. But I have a new question to ask you. I want to save Triton ir now, and I found that the Triton compiler defaults to saving all intermediate files under
/home/. triton/dump It seems that each folder corresponds to the intermediate representation of an operator (I don't know if my understanding is correct). Is there a way to integrate all IR files into one IR file and use this complete IR file to run the inference results of the model? Thank you！

BobMcDear commented 3 months ago

I am unfortunately ill-equipped to answer your question as I have not worked with Triton IR in the past. I suggest you open up an issue on the Triton repository since this issue does not directly concern attorch.

LHW-CLOUD commented 3 months ago

thank you！