dashaasienga / Statistics-Senior-Honors-Thesis

0 stars 0 forks source link

Week 5 Summary and Questions -- QSA (Tutorial #1) #10

Closed dashaasienga closed 7 months ago

dashaasienga commented 11 months ago

@katcorr

Overview

This week, I completed the tutorial on the Seldonian algorithm in the Jupyter notebook. I was able to implement all the functions and obtain a solution for the simple regression problem. At a high level, we are simply partitioning the data into train and test sets, getting a candidate solution, and running the safety test.

P.S. I typed up the LateX below in the notebook, so that will be easy to transfer onto the final paper :)

Safety Test

Screen Shot 2023-10-16 at 16 02 21

Candidate Solution

Screen Shot 2023-10-16 at 16 03 03

The black box algorithm used to search for a candidate solution is called Powell, which is simply an algorithm designed for finding a local minimum of a function using a bi-directional linear search. There are many other algorithms we could use, but this is the one the researchers chose to go with to converge to a solution. I wonder what changes, if any, we would observe if we employed different minimization/ maximization algorithms?

Powell, however, is not a constrained algorithm. One way of addressing this limitation is by incorporating the constraint into the objective function as a barrier function. In constrained optimization, a field of mathematics, barrier functions are used to replace inequality constraints by a penalizing term in the objective function that is easier to handle.

Screen Shot 2023-10-16 at 16 05 44

In this case, solutions that are predicted not to pass the safety test will not be selected by the optimization algorithm because we assign a large negative performance to them. This barrier function encourages Powell to tend towards solutions that will pass the safety test.

Solution

After implementing all the necessary functions, our Quasi-Seldonian algorithm found a solution that minimizes the sample mean squared error, while ensuring (with high probability) that all behavioral constraints are satisfied!

Screen Shot 2023-10-16 at 16 08 32

Personal Experimentation

I further observed that the ordinary least squares solution was:

Screen Shot 2023-10-16 at 16 16 32

I generated this visual as well to help dissect this further:

Screen Shot 2023-10-16 at 16 18 51

It seems that the line is a good fit, but not the best fit. It satisfies the 2 behavioral constraints we set though!

Questions