brunorigal / autograd-minimize

A wrapper around scipy minimize which uses the autograd capacities of tensorflow to compute the gradient and hessian.
MIT License
23 stars 6 forks source link

Sum of Optimized Parameters Matches Initial Guess Sum in Autograd_Minimize #10

Open ant1ni1 opened 1 month ago

ant1ni1 commented 1 month ago

I'm encountering an unexpected behavior while using the autograd_minimize function in Python for optimization. Here's a summary of the issue:

Objective: I'm using autograd_minimize to minimize an objecttive function. The objective function computes the error between calculated values and expected values based on input parameters.

Observation: After running the optimization process, I've noticed that while the individual optimized parameters are different from the initial guess parameters, the sum of the optimized parameters is matching the sum of the initial guess parameters. This behavior is unexpected as I don't have any constraints that enforce such a match.

Expected Outcome: I would expect the optimization process to converge to parameter values that minimize the error, without necessarily matching the sum of the initial guess parameters.

I've reviewed the implementation of the objective function and the optimization setup, but I haven't been able to identify the source of this unexpected behavior.

Are there any insights into what might be causing the sum of the optimized parameters to match the sum of the initial guess parameters? Any suggestions or advice on how to resolve this issue would be greatly appreciated.

Thank you for your assistance!

I have tried all methods in the framework and it seems that it behaves only in methods with gradient computation.

brunorigal commented 1 month ago

I see no reason for this. And I did not see anything like this in other gradient based optim. For example, the first example in the readme does not follow this pattern. Could it be a specificity of the function you optimize? Do you have a minimal example showing this effect?

ant1ni1 commented 1 month ago

Thank you for the response @brunorigal. Please find below my simplified script that turns into the described issue:

### IMPORTS ###
import pandas as pd
import numpy as np
import tensorflow as tf
import torch
from autograd_minimize import minimize

# Linear interpolation function using PyTorch
def torch_interp(x, xp, fp):
    # Sort xp and corresponding fp
    xp_sorted, indices = torch.sort(xp)
    fp_sorted = fp[indices]

    # Convert x and fp to the device of xp
    x = x.to(xp_sorted.device)
    fp_sorted = fp_sorted.to(xp_sorted.device)

    # Perform interpolation
    inds = torch.searchsorted(xp_sorted, x)
    inds = torch.clamp(inds, 1, len(xp_sorted) - 1)
    x_lo = xp_sorted[inds - 1]
    x_hi = xp_sorted[inds]
    y_lo = fp_sorted[inds - 1]
    y_hi = fp_sorted[inds]
    slope = (y_hi - y_lo) / (x_hi - x_lo)
    y = slope * (x - x_lo) + y_lo

    return y

# Define the objective function
def objective_torch_issue(params):
    global x1, x2, x3

    # Ensure params is a PyTorch tensor with requires_grad=True
    if isinstance(params, np.ndarray):
        params = torch.tensor(params, dtype=torch.float64, requires_grad=True)
    elif not params.requires_grad:
        params.requires_grad_(True)

    # Pre-allocate memory for lists
    agg_data1 = []
    agg_data2 = []

    start = 0
    end = 0
    # Iterate
    for i in range(len(x1)):
        # Start and end index for each separate list
        start = end
        end += len(x1[i])

        # Convert lists to tensors
        x1_tensor = torch.tensor(x1[i], dtype=torch.float64)
        x2_tensor = torch.tensor(x2[i], dtype=torch.float64)
        x3_tensor = torch.tensor(x3[i], dtype=torch.float64)

        # Interpolation
        x2_interp = torch_interp(x2_tensor, x1_tensor, params[start:end])
        agg_data1.append(x2_interp)

        y_interp_close = torch_interp(x3_tensor, x1_tensor, params[start:end])
        agg_data2.append(y_interp_close)

    # Aggregate
    agg_data1_total = torch.sum(torch.stack(agg_data1), axis=0)
    agg_data2_total = torch.sum(torch.stack(agg_data2), axis=0)

    # Calculate the error using vectorized operations
    error = torch.sum(torch.abs(agg_data1_total - agg_data2_total))

    return error

global x1, x2, x3

# Initial guess
params = [0,10,20,30,40,50]

x1 = [[1,5,6,7,8,9]]

x2 = [[1.1, 1.2, 1.3, 1.6, 2.7, 3.2]]
x3 = [[1.2, 1.3, 1.6, 2.7, 3.2, 3.4]]

# Define the function to compute both the error and its derivative 
result = minimize(objective_torch_issue, np.array(params), backend="torch", precision="float64", method="SLSQP", hvp_type='vhp', tol=1e-10)

# Extract optimized values
optimal_values = list(result.x)

print(params)
print(optimal_values)

print(sum(params))
print(sum(optimal_values))
ant1ni1 commented 3 weeks ago

Any updates on the issue?

brunorigal commented 3 weeks ago

I think the problem is with your example of extrapolation and the optimized function: The values of x2 all map to one single value of x1 (1). As a consequence torch.searchsorted(xp_sorted, x) returns only ones. and only the fist two parameters are updated.