NVlabs / nvbitfi

Architecture-level Fault Injection Tool for GPU Application Resilience Evaluation
Other
53 stars 22 forks source link

High execution time for Pytorch models #16

Closed fernandoFernandeSantos closed 2 years ago

fernandoFernandeSantos commented 2 years ago

Hello

I`m trying to perform fault injection on a Pytorch model (fasterrcnn_resnet50_fpn). However, the execution time for a single fault injection is taking too much, 155s for a single inference.

Is this expected? Is there a way to speed up the fault injection for Pytorch?

Thanks for the help

sivahari commented 2 years ago

We had an email conversation and it seemed that the overhead is coming from a large number of event calls. Selectively disabling event callbacks is one way to address it in the future.

More details: https://github.com/NVlabs/NVBit/issues/79.