"Interpretable Machine Learning" has a much longer tradition in "Interpretable Software", where a machine learning program can be thought of well simply as a program.
The problem then becomes: What does the program mean, does it perform what it is supposed to do? This discipline is static analysis.
"Interpretable Machine Learning" has a much longer tradition in "Interpretable Software", where a machine learning program can be thought of well simply as a program.
The problem then becomes: What does the program mean, does it perform what it is supposed to do? This discipline is static analysis.
The leading paper on interpretating and proving properties on neural networks is: https://www.sri.inf.ethz.ch/publications/singh2019domain
They use a Zonotope domain to prove special properties about neural networks.
It is basically an advanced method of interval analysis applied to neural networks.