Open nikospappas1987 opened 2 years ago
Could you please share your code?
I already had a similar example tested where I print the parents in the last generation using the last_generation_parents
parameter after saving and also after loading the instance. I see the list of parents are identical.
ga_instance.last_generation_parents
Sorry for the delayed answer but I was sick.
After paying more attention to it, there is no problem. It's just that the algorithm was very close to convergence so after one more round the two best solutions are identical. So it loads the two different parents but after one round the two best solutions get identical. Thanks for the advice on using ga_instance.last_generation_parents
, I wouldn't find it without it. And thanks a lot for developing this great library.
Something different I noticed while trying to understand the above issue is that the ga_instance.best_solution()
sometimes doesn't return the best solution. For example the code:
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Fitness of the best solution :", solution_fitness)
print(f'Fitness of the last solution :{ga_instance.last_generation_fitness}')
printed
Fitness of the best solution : 0.792773668893014
Fitness of the last solution :[0.82573473 0.82573473 0.79277367 0.70507075 0.72833979 0.82573473 0.73806056 0.7600569 0.73776799 0.71602506]
which is obviously wrong. But after one generation it again prints the right solution:
Fitness of the best solution : 0.8257347260699635
Fitness of the last solution :[0.82573473 0.82573473 0.7620138 0.77750815 0.79653262 0.77248549 0.76682924 0.73932479 0.76655292 0.77736544]
This thing happens randomly and rarery and from what I understand it doesn't seem to have any effect to the algorithm
Are you using multithreading by any chance? Just a hunch/question.
Keith
Sent from my iPhone
On 10 Mar 2022, at 08:28, nikospappas1987 @.***> wrote:
Sorry for the delayed answer but I was sick.
After paying more attention to it, there is no problem. It's just that the algorithm was very close to convergence so after one more round the two best solutions are identical. So it loads the two different parents but after one round the two best solutions get identical. Thanks for the advice on using ga_instance.last_generation_parents , I wouldn't find it without it. And thanks a lot for developing this great library.
Something different I noticed while trying to understand the above issue is that the ga_instance.best_solution() sometimes doesn't return the best solution. For example the code: solution, solution_fitness, solution_idx = ga_instance.best_solution() print("Fitness of the best solution :", solution_fitness) print(f'Fitness of the last solution :{ga_instance.last_generation_fitness}')
printed Fitness of the best solution : 0.792773668893014 Fitness of the last solution :[0.82573473 0.82573473 0.79277367 0.70507075 0.72833979 0.82573473 0.73806056 0.7600569 0.73776799 0.71602506]
which is obviously wrong. But after one generation it again prints the right solution: Fitness of the best solution : 0.8257347260699635 Fitness of the last solution :[0.82573473 0.82573473 0.7620138 0.77750815 0.79653262 0.77248549 0.76682924 0.73932479 0.76655292 0.77736544]
This thing happens randomly and rarery and from what I understand it doesn't seem to have any effect to the algorithm
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you are subscribed to this thread.
Sorry for the delayed answer but I was sick.
After paying more attention to it, there is no problem. It's just that the algorithm was very close to convergence so after one more round the two best solutions are identical. So it loads the two different parents but after one round the two best solutions get identical. Thanks for the advice on using
ga_instance.last_generation_parents
, I wouldn't find it without it. And thanks a lot for developing this great library.Something different I noticed while trying to understand the above issue is that the
ga_instance.best_solution()
sometimes doesn't return the best solution. For example the code:solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Fitness of the best solution :", solution_fitness)
print(f'Fitness of the last solution :{ga_instance.last_generation_fitness}')
printed
Fitness of the best solution : 0.792773668893014
Fitness of the last solution :[0.82573473 0.82573473 0.79277367 0.70507075 0.72833979 0.82573473 0.73806056 0.7600569 0.73776799 0.71602506]
which is obviously wrong. But after one generation it again prints the right solution:
Fitness of the best solution : 0.8257347260699635
Fitness of the last solution :[0.82573473 0.82573473 0.7620138 0.77750815 0.79653262 0.77248549 0.76682924 0.73932479 0.76655292 0.77736544]
This thing happens randomly and rarery and from what I understand it doesn't seem to have any effect to the algorithm
No issue @nikospappas1987. I hope you are recovered now!
I will check for the reason why the best_solution()
method did not return the best fitness reported in last_generation_fitness
. Thanks for reporting that!
Are you using multithreading by any chance? Just a hunch/question. Keith … Sent from my iPhone On 10 Mar 2022, at 08:28, nikospappas1987 @.***> wrote: Sorry for the delayed answer but I was sick. After paying more attention to it, there is no problem. It's just that the algorithm was very close to convergence so after one more round the two best solutions are identical. So it loads the two different parents but after one round the two best solutions get identical. Thanks for the advice on using ga_instance.last_generation_parents , I wouldn't find it without it. And thanks a lot for developing this great library. Something different I noticed while trying to understand the above issue is that the ga_instance.best_solution() sometimes doesn't return the best solution. For example the code: solution, solution_fitness, solution_idx = ga_instance.best_solution() print("Fitness of the best solution :", solution_fitness) print(f'Fitness of the last solution :{ga_instance.last_generation_fitness}') printed Fitness of the best solution : 0.792773668893014 Fitness of the last solution :[0.82573473 0.82573473 0.79277367 0.70507075 0.72833979 0.82573473 0.73806056 0.7600569 0.73776799 0.71602506] which is obviously wrong. But after one generation it again prints the right solution: Fitness of the best solution : 0.8257347260699635 Fitness of the last solution :[0.82573473 0.82573473 0.7620138 0.77750815 0.79653262 0.77248549 0.76682924 0.73932479 0.76655292 0.77736544] This thing happens randomly and rarery and from what I understand it doesn't seem to have any effect to the algorithm — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you are subscribed to this thread.
Not yet!
Are you using multithreading by any chance? Just a hunch/question. Keith … Sent from my iPhone On 10 Mar 2022, at 08:28, nikospappas1987 @.***> wrote: Sorry for the delayed answer but I was sick. After paying more attention to it, there is no problem. It's just that the algorithm was very close to convergence so after one more round the two best solutions are identical. So it loads the two different parents but after one round the two best solutions get identical. Thanks for the advice on using ga_instance.last_generation_parents , I wouldn't find it without it. And thanks a lot for developing this great library. Something different I noticed while trying to understand the above issue is that the ga_instance.best_solution() sometimes doesn't return the best solution. For example the code: solution, solution_fitness, solution_idx = ga_instance.best_solution() print("Fitness of the best solution :", solution_fitness) print(f'Fitness of the last solution :{ga_instance.last_generation_fitness}') printed Fitness of the best solution : 0.792773668893014 Fitness of the last solution :[0.82573473 0.82573473 0.79277367 0.70507075 0.72833979 0.82573473 0.73806056 0.7600569 0.73776799 0.71602506] which is obviously wrong. But after one generation it again prints the right solution: Fitness of the best solution : 0.8257347260699635 Fitness of the last solution :[0.82573473 0.82573473 0.7620138 0.77750815 0.79653262 0.77248549 0.76682924 0.73932479 0.76655292 0.77736544] This thing happens randomly and rarery and from what I understand it doesn't seem to have any effect to the algorithm — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you are subscribed to this thread.
No multi threading in the callback function where the best_solution()
method is called.
Sorry for the delayed answer but I was sick.
After paying more attention to it, there is no problem. It's just that the algorithm was very close to convergence so after one more round the two best solutions are identical. So it loads the two different parents but after one round the two best solutions get identical. Thanks for the advice on using
ga_instance.last_generation_parents
, I wouldn't find it without it. And thanks a lot for developing this great library.Something different I noticed while trying to understand the above issue is that the
ga_instance.best_solution()
sometimes doesn't return the best solution. For example the code:solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Fitness of the best solution :", solution_fitness)
print(f'Fitness of the last solution :{ga_instance.last_generation_fitness}')
printed
Fitness of the best solution : 0.792773668893014
Fitness of the last solution :[0.82573473 0.82573473 0.79277367 0.70507075 0.72833979 0.82573473 0.73806056 0.7600569 0.73776799 0.71602506]
which is obviously wrong. But after one generation it again prints the right solution:
Fitness of the best solution : 0.8257347260699635
Fitness of the last solution :[0.82573473 0.82573473 0.7620138 0.77750815 0.79653262 0.77248549 0.76682924 0.73932479 0.76655292 0.77736544]
This thing happens randomly and rarery and from what I understand it doesn't seem to have any effect to the algorithm
Based on the tests, if a deterministic fitness function, then I did not found a difference between the fitness values calculated by these 2 lines.
_, solution_fitness, _ = ga_instance.best_solution()
max_last_generation_fitness = max(ga_instance.last_generation_fitness)
For a non-deterministic fitness function, these 2 values may differ. I did an experiment where the fitness value depends on a random number which makes the fitness value calculated by ga_instance.best_solution()
differs.
You are welcome to report a case to test where the 2 values are different.
Yes I noticed this issue with a non-deterministic fitness function but now I wonder why do the two values match most of the times? If it computes the fitness for each method, I would expect the two values to differ most of the times for non-deterministic fitness function
I agree. But things look normal and we need a test case which causes the 2 values to be different.
Ok here's the code that produces it. I use the genetic algorithm to select features used at a LightGBM binary classifier. I'm sorry but I can't share the data I work on as they are health care data covered by a strict data sharing agreement.
import pygad
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.model_selection import RepeatedStratifiedKFold
from lightgbm import LGBMClassifier
from sklearn.model_selection import cross_val_score
import os
import time
import random
s1 = 1
s2 = 1
filename = ''genetic_runs/boris_00_stemi_macce_t0_simple_data''
def compute_fitness_roc(roc, s):
if roc<0.5:
return 0
return ((roc-0.5)/0.5)**s
def compute_fitness_sparse(num_selected, s):
if num_selected>120:
return 0
elif num_selected<10:
return 1
return ((num_selected-120)/-110)**s
def fitness_func(solution, solution_idx):
selected = np.array(solution).astype(bool)
X = X_train[X_train.columns[selected]]
pipe = Pipeline([
('clf', LGBMClassifier(is_unbalance=True,
metric='auc',
verbosity=-1,
subsample=0.35,
subsample_freq=4,
extra_trees=True,
colsample_bytree=1.0,
feature_fraction_bynode=1.0,
reg_alpha=6e-7,
reg_lambda=0.03,
learning_rate=0.001,
linear_lambda=6.6e-07,
max_depth=34,
min_child_samples=1,
n_estimators=80,
objective='cross_entropy'))])
scores = cross_val_score(pipe,
X, y_train,
scoring='roc_auc',
cv=RepeatedStratifiedKFold(n_splits=4, n_repeats=10),
n_jobs=-1)
roc = np.mean(scores)
fitness_roc = compute_fitness_roc(roc, s=s1)
fitness_sparse = compute_fitness_sparse(np.sum(solution), s=s2)
fitness = (fitness_roc*fitness_sparse)**0.5
return fitness
def callback_gen(ga_instance):
if ga_instance.generations_completed%50 == 0:
ga_instance.save(filename=filename)
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print(time.ctime(time.time()))
print("Generation : ", ga_instance.generations_completed)
print("Fitness of the best solution :", solution_fitness)
print(f'Number of features of best solution: {np.sum(solution)}')
print(f'Fitness of the last solution :{ga_instance.last_generation_fitness}')
print()
if os.path.isfile(filename + '.pkl'):
print('loading from file')
ga_instance = pygad.load(filename=filename)
else:
ga_instance = pygad.GA(num_generations=5000,
num_parents_mating=2,
fitness_func=fitness_func,
sol_per_pop=10,
num_genes=X_train.columns.shape[0],
gene_type=int,
parent_selection_type="sss",
keep_parents=-1,
crossover_type="single_point",
crossover_probability=None,
mutation_type='adaptive',
mutation_probability=(0.04, 0.02),
gene_space=[0, 1],
stop_criteria='saturate_1000',
save_solutions=True,
on_generation=callback_gen
)
ga_instance.run()
ga_instance.save(filename=filename)
Thank you.
When I load a previously saved instance of the genetic algorithm with
ga_instance = pygad.load(filename=filename)
the loaded instance has only the best solution as parent and not the selected number of parents from the save instance. To articulate, fornum_parents_mating=2
andkeep_parents=-1
the loaded instance has two identical parents (two copies of the best solution of the saved instance) and not the two parents of the saved instance