fffasttime / MRFI

https://fffasttime.github.io/MRFI/
MIT License
8 stars 1 forks source link

Use during training time #2

Open MALONSO-ARC opened 6 months ago

MALONSO-ARC commented 6 months ago

Hi.

Thank you for this great contribution. I would like to know if you can use MRFI during training time, e.g. to perform fault-tolerant training. As I understood, faults are injected as Pytorch hooks, but will these propagate the gradient correctly during training time considering the fault injection (decouple output from input for the outputs affected by the injection)?

Taking a look at the code, I see the forward hooks added at mrfi.MRFI.__add_hoks(), but I don't see that any backward hooks are implemented. So I assume that at this moment MRFI can't be used at training time. Please correct me if I'm wrong.

If that's the case, are there any plans to implement this feature? It would be a very useful feature to implement fault-tolerant training pipelines.

Thanks

fffasttime commented 4 weeks ago

Hello

Yes, you are right. The current feature does not include simulation of backpropagation. MRFI was originally designed for fault injection in forward inference to expand the functionality of existing forward propagation simulation tools, for simulating different edge inference devices in safety critical scenarios. We have also noticed that training in backpropagation is important, but the way backpropagation works is very different from forward propagation, and it is uncertain whether it can be correctly incorporated into the current version of MRFI. I may continue to consider this issue.

Thank you for your attention.