Open sctyner opened 6 years ago
I stumbled across this paper http://www.jmlr.org/papers/volume11/ralaivola10a/ralaivola10a.pdf, which seeks to provide some probabilistic bounds on the risk for a classifier learned under non iid data. I haven't read it yet, but it may be something to look into. One potential issue with this direction is that I'm pretty sure the theory requires a specific type of loss function and assumes that one is making decisions so as to minimize the expected loss. However, the framework that this paper builds on seems to be a very popular one for understanding the theoretical accuracy properties of machine learning models.
Context: Today in S&T @Carriquiry brought up the fact that a lot of problems CSAFE works on are learning from non-iid data. e.g. compare shoe 1 to shoe 2, then shoe 1 to shoe 3, then shoe 1 to shoe 4, and so on.
Problem: This is a problem because most learning methods treat data as iid. So what can we do to deal with that fact?
So what? Key to the mission of CSAFE: establishing statistical foundations for forensics.