Task: Set up a toy constrastive optimization problem

g-simmons commented 1 year ago

Background

I think instead of focusing on MiCE this week, we might set up an even simpler optimization problem, no model needed.

Addition of two numbers, for example:

Input:

a string consisting of two digits. Template like this "{digit_1} + {digit_2} ="

Label (a very simple "behavior" score):

the sum of the two digits

The data might look like:

Feature 1	Feature 2	Prompt	Score
00	16	00 + 16 =	16
02	01	02 + 01 =	03
09	09	09 + 09 =	18

And so on.

This gives us a natural way to to put the data in a 2D feature space for visualization. However, the behavior score can be increased arbitrarily - the aren't any local minima to speak of.

Let's look for a simple function with a known number of local minima.

Something like this, for example:

https://pubs.acs.org/doi/10.1021/acs.jpca.2c00647

How to:

[ ] Identify a function with a known number of local minima. Bonus points it has been used in the optimization literature.
[ ] Write a python script that has configuration variables to control:
- Number of samples
- Seed for random number generator
- ...
script should sample the input space (some region in $\mathbb{Z}^2$, for example),

and returns something like the table above, for the function chosen
[ ] Generalize the script to accept any user-specified function of two integers (this will be helpful for running experiments)
[ ] Store the resulting data to JSONL file (see pd.to_json(...,lines=True)

AbhiramBorra commented 1 year ago

I'm thinking we could use a sin[(feature1+feature2)x] function to generate the data. Both features would increment at random but I will keep them between 0 and 5pi (2 local minima). How many data points do I need to generate? Also, what are you referring to when you say "optimization literature"?

g-simmons commented 1 year ago

Also, what are you referring to when you say "optimization literature"?

"The literature" generally refers to the collection of scientific writing on a topic. In this case we are talking about the scientific papers that study optimization.

In particular, I'm asking you to look for functions commonly used to compare the behavior of different optimization algorithms.

There are some functions like Beale's function, Simonescu's function, etc. that are often used to see how an optimization algorithm behaves in a loss landscape with certain characteristics. Here's Himmelblau's function, for example.

GPT-4 listed the following functions:

Beale's Function: This is a famous test case for optimization algorithms, known for its difficult-to-find global minimum. The function is usually evaluated on the square x1, x2 in the range [-4.5, 4.5].
Rosenbrock Function: Often called the "Banana Function" due to its shape, this function is interesting because it has a shallow, curved valley that converges to the global minimum. It's designed to test an algorithm's ability to converge to a minimum that isn't surrounded by steep gradients.
Ackley Function: This is a widely used function for testing optimization algorithms. It has many local minima and a global minimum, making it ideal for testing the robustness of algorithms against local minimum convergence.
Rastrigin Function: This is a non-convex function with a large search space and a lot of local minima, making it particularly challenging for optimization algorithms. It's useful for testing the global search capability of algorithms.
Sphere Function: One of the simplest benchmark functions, with a d-dimensional input space. It’s a convex function, and all paths lead to the minimum, making it easier to optimize than others.
Levi Function N.13: This function is continuous, differentiable, and has several local minima, making it a good challenge for optimization algorithms trying to find the global minimum.
Easom Function: This function presents a challenge because it has just one global minimum and the rest of the search space is relatively flat. This characteristic tests an algorithm's precision in locating minima.
Himmelblau's Function: This function is multimodal, with four identical local minima. It's useful for testing how well optimization algorithms can distinguish between multiple possible solutions.
Eggholder Function: Known for its complex landscape with numerous local minima and maxima, it is particularly tough for optimization algorithms and is often used in robustness testing.
Goldstein–Price Function: This function has a complex landscape, despite its deceivingly simple mathematical form, and is designed to test the performance of optimization algorithms in terms of escaping local minima.

I'd suggest to start with one function, then generalize the script to allow for a choice between multiple functions. You might look around to see if there is a Python library implementing many of these optimization benchmarks. I would not be surprised if sklearn, scipy, or numpy had implementations for many of these, though I'm not 100% sure.

How many data points do I need to generate?

This is a question about parameters for the experimental methods. There are several possible answers:

As many as we need to feel comfortable about the analyses that depend on the output of this script
As many as we can afford
Some nice round number, like 1000

Often the best way to go is to choose a reasonable default value, and make it easy to change the value later (make it an argument that the user can specify from the command line, for example). "If you can't make it perfect, make it adjustable"

g-simmons commented 1 year ago

@ria-bhandari @ArcherX0X Your input here would be appreciated.

Do any well-known functions come to mind?

Is anything from the "Dual Active Set Methods" paper relevant here?

ria-bhandari commented 1 year ago

I found a great resource to help us find what is the best function to use: https://www.sfu.ca/~ssurjano/optimization.html

At first glance, sin x and cos x came to mind. Maybe even x^2. However, the link I provided has a lot more options specifically for optimization.

g-simmons / persona-research-internship

Task: Set up a toy constrastive optimization problem #33

Background

How to: