Open NicolaBernini opened 4 years ago
Intelligence as the ability to generalize
Intelligence as theability to act in an imagined space (definition of thinking, according to Konrad Lorentz)
Data Driven ML consists of learning models from data
How is data generated or to be more explicit what are the assumptions on data?
Typically ML methods rely on the assumptions that data samples are Independent and Identically Distributed (IID) which means
What happens when these assumptions do not hold?
Typically performance drops but sometimes this could be very sharp, for example let's consider Adversarial Attacks
Adversarial Attacks can be seen as the result of a violation of the "Identically Distributed" assumption: the PDF they come from is too distant form the training one (domain gap)
They can also be seen as a failure of the model to generalize properly (not enough intelligent)
They can also be seen as the result of model instability in certain points of their input space as a small variation in the input causes a huge variation in the output
But adding also the temporal dimension, Adversarial Attacks can be seen as a violation of the "Independent" assumption, as an attacker can resubmit more and more times the same sample
Furthermore, considering training, as it is an iterative process the weights at a certain iteration depend both on the data observed at that iteration and the weights at the previous iterations so failing at properly shuffling the samples in the dataset could make the NN learn some false correlation in its weights
Overview
Causality for Machine Learning
Arxiv: https://arxiv.org/abs/1911.10500