Week 5 Summary and Questions -- QSA (Tutorial #1)

@katcorr

Overview

This week, I completed the tutorial on the Seldonian algorithm in the Jupyter notebook. I was able to implement all the functions and obtain a solution for the simple regression problem. At a high level, we are simply partitioning the data into train and test sets, getting a candidate solution, and running the safety test.

P.S. I typed up the LateX below in the notebook, so that will be easy to transfer onto the final paper :)

Safety Test

Candidate Solution

The black box algorithm used to search for a candidate solution is called Powell, which is simply an algorithm designed for finding a local minimum of a function using a bi-directional linear search. There are many other algorithms we could use, but this is the one the researchers chose to go with to converge to a solution. I wonder what changes, if any, we would observe if we employed different minimization/ maximization algorithms?

Powell, however, is not a constrained algorithm. One way of addressing this limitation is by incorporating the constraint into the objective function as a barrier function. In constrained optimization, a field of mathematics, barrier functions are used to replace inequality constraints by a penalizing term in the objective function that is easier to handle.

In this case, solutions that are predicted not to pass the safety test will not be selected by the optimization algorithm because we assign a large negative performance to them. This barrier function encourages Powell to tend towards solutions that will pass the safety test.

Solution

After implementing all the necessary functions, our Quasi-Seldonian algorithm found a solution that minimizes the sample mean squared error, while ensuring (with high probability) that all behavioral constraints are satisfied!

Personal Experimentation

I further observed that the ordinary least squares solution was:

I generated this visual as well to help dissect this further:

It seems that the line is a good fit, but not the best fit. It satisfies the 2 behavioral constraints we set though!

Questions

Are you familiar with constrained optimization? I understand the reasoning behind the barrier function but I'd love for the idea to be more solid. It seems this would be relevant as we essentially need to place constraints on our optimization functions to minimize bias across demographic groups.

dashaasienga / Statistics-Senior-Honors-Thesis