Problems with training multiple models with different random seeds

I'm using pykan version 0.2.1 and here are the issues I ran into:

I want to train multiple models at the same time, with different seeds of random numbers, so that the best model is selected. However, the code I defined produced two different types of errors at runtime. Here's my code

def evaluate_model(model, dataset):
    """Evaluate the model and return MSE, RMSE, MAE, and R2 scores."""
    model.eval()
    with torch.no_grad():
        predictions = model(dataset['test_input'])
        labels = dataset['test_label']

    mse = mean_squared_error(labels.cpu(), predictions.cpu())
    rmse = np.sqrt(mse)
    mae = mean_absolute_error(labels.cpu(), predictions.cpu())
    r2 = r2_score(labels.cpu(), predictions.cpu())

    return mse, rmse, mae, r2

def train_multiple_models(dataset, num_iterations=100):
    results = []

    for i in tqdm(range(num_iterations), desc="Training models"):
        seed = random.randint(0, 1000000)

        # Set seed for reproducibility
        torch.manual_seed(seed)
        np.random.seed(seed)
        random.seed(seed)

        # Initialize the model
        model = KAN(width=[6,13,1], grid=5, k=3, seed=seed, auto_save=True)
        print(f"Training with seed: {seed}")

        # Forward pass to initialize the model with input dimensions
        model(dataset['train_input'])

        # First phase of training
        # The disable parameter is added to prevent the progress bar from being displayed
        result = model.fit(dataset, opt="LBFGS", steps=5, lamb=0.01, lamb_entropy=10., disable=True)

        # Pruning step
        model = model.prune()

        # Second phase of training
        result = model.fit(dataset, opt="LBFGS", steps=10, disable=True)

        # Evaluate the model
        mse, rmse, mae, r2 = evaluate_model(model, dataset)

        # Save the results
        results.append({
            'model': model,
            'mse': mse,
            'rmse': rmse,
            'mae': mae,
            'r2': r2,
            'seed': seed
        })

    # Find the model with the highest r2 score
    best_model_info = max(results, key=lambda x: x['r2'])

    # Output the best model information
    print(f"Best model seed: {best_model_info['seed']}")
    print(f"Best model MSE: {best_model_info['mse']}")
    print(f"Best model RMSE: {best_model_info['rmse']}")
    print(f"Best model MAE: {best_model_info['mae']}")
    print(f"Best model R2: {best_model_info['r2']}")

    return best_model_info['model'], best_model_info

When I tried to run the train_multiple_models method, two different things happened.

Case 1, the model after prune is unable to continue training, and the following error is reported At this point I changed the network width to [6,13,6,1] because I didn't know how to reproduce this error without changing the network width.
Case 2, the training continues but the seed is not updated. Screenshot below By the way, 885440 is from a model I trained earlier.I deleted the model folder and still had the same problem.

This has some trouble, please tell me how to solve it. Thank you.

Hi 911569318,

Regarding your first problem:

I am not entirely sure what happened. My first guess would be that the pruning destroyed too much of the network making everything fall apart. Calling

model(dataset['train_input'])
model.plot()
plt.show()# In case you run outside an interactive plt backend like jupyter notebook

after pruning could give additional information. Or a bit more explicit:

print(model.mask_up)
print(model.mask_down)
for layer in model.act_fun:
    print(layer.mask, "\n")

Or use a debugger to avoid the prints.

Regarding your second problem:

The current implementation of MultKAN sets the given seed during initialization as the global seed for numpy, torch and random and thereby resets the rng each time a KAN is initialized. Therefore, when you call random.randint(0, 1000000) after any initialization of a KAN that was followowed by deterministic steps, only, the result will always be the same.

This is actually a small oversight in the KAN implementation, instead any KAN instantiation should use its own rng. I will write a specific issue about that and link it here.

In the meantime you can use the following workaround:

# in the beginning of your script
rng = np.random.default_rng(initial_seed)
# whenever you need a random number, as a seed itself or for any other case
seed = rng.randint(0, 1000000)

Hope that helps, Leonard

KindXiaoming / pykan