KindXiaoming / pykan

Kolmogorov Arnold Networks
MIT License
14.53k stars 1.33k forks source link

Problems with training multiple models with different random seeds #351

Open 911569318 opened 1 month ago

911569318 commented 1 month ago

I'm using pykan version 0.2.1 and here are the issues I ran into:

I want to train multiple models at the same time, with different seeds of random numbers, so that the best model is selected. However, the code I defined produced two different types of errors at runtime. Here's my code

def evaluate_model(model, dataset):
    """Evaluate the model and return MSE, RMSE, MAE, and R2 scores."""
    model.eval()
    with torch.no_grad():
        predictions = model(dataset['test_input'])
        labels = dataset['test_label']

    mse = mean_squared_error(labels.cpu(), predictions.cpu())
    rmse = np.sqrt(mse)
    mae = mean_absolute_error(labels.cpu(), predictions.cpu())
    r2 = r2_score(labels.cpu(), predictions.cpu())

    return mse, rmse, mae, r2
def train_multiple_models(dataset, num_iterations=100):
    results = []

    for i in tqdm(range(num_iterations), desc="Training models"):
        seed = random.randint(0, 1000000)

        # Set seed for reproducibility
        torch.manual_seed(seed)
        np.random.seed(seed)
        random.seed(seed)

        # Initialize the model
        model = KAN(width=[6,13,1], grid=5, k=3, seed=seed, auto_save=True)
        print(f"Training with seed: {seed}")

        # Forward pass to initialize the model with input dimensions
        model(dataset['train_input'])

        # First phase of training
        # The disable parameter is added to prevent the progress bar from being displayed
        result = model.fit(dataset, opt="LBFGS", steps=5, lamb=0.01, lamb_entropy=10., disable=True)

        # Pruning step
        model = model.prune()

        # Second phase of training
        result = model.fit(dataset, opt="LBFGS", steps=10, disable=True)

        # Evaluate the model
        mse, rmse, mae, r2 = evaluate_model(model, dataset)

        # Save the results
        results.append({
            'model': model,
            'mse': mse,
            'rmse': rmse,
            'mae': mae,
            'r2': r2,
            'seed': seed
        })

    # Find the model with the highest r2 score
    best_model_info = max(results, key=lambda x: x['r2'])

    # Output the best model information
    print(f"Best model seed: {best_model_info['seed']}")
    print(f"Best model MSE: {best_model_info['mse']}")
    print(f"Best model RMSE: {best_model_info['rmse']}")
    print(f"Best model MAE: {best_model_info['mae']}")
    print(f"Best model R2: {best_model_info['r2']}")

    return best_model_info['model'], best_model_info

When I tried to run the train_multiple_models method, two different things happened.

This has some trouble, please tell me how to solve it. Thank you.

lfrommelt commented 2 weeks ago

Hi 911569318,

Regarding your first problem:

I am not entirely sure what happened. My first guess would be that the pruning destroyed too much of the network making everything fall apart. Calling

model(dataset['train_input'])
model.plot()
plt.show()# In case you run outside an interactive plt backend like jupyter notebook

after pruning could give additional information. Or a bit more explicit:

print(model.mask_up)
print(model.mask_down)
for layer in model.act_fun:
    print(layer.mask, "\n")

Or use a debugger to avoid the prints.

Regarding your second problem:

The current implementation of MultKAN sets the given seed during initialization as the global seed for numpy, torch and random and thereby resets the rng each time a KAN is initialized. Therefore, when you call random.randint(0, 1000000) after any initialization of a KAN that was followowed by deterministic steps, only, the result will always be the same.

This is actually a small oversight in the KAN implementation, instead any KAN instantiation should use its own rng. I will write a specific issue about that and link it here.

In the meantime you can use the following workaround:

# in the beginning of your script
rng = np.random.default_rng(initial_seed)
# whenever you need a random number, as a seed itself or for any other case
seed = rng.randint(0, 1000000)

Hope that helps, Leonard