Closed bml1g12 closed 6 years ago
Might be related to #3
@bml1g12 I understand your problem. I do a lot of work with class imbalanced data sets as well. I can look into this for you soon, but @mikkokotila knows the code much better and might have quicker insight.
Also as an aside, glad to know that supplying x_val
and y_val
independently worked for you! π
Sorry @bml1g12 just co clarify, are you saying the feature in which you input x_val
and y_val
does not work properly, and you have to make that change to y_pred
?
Thank you. Yes having the x_val, y_val functionality is a great addition, as without it I would not be able to use talos. Indeed the values for _val printed by keras seem to be working fine. (The issue is with the data that Talos chooses to save at the end of a parameter. )
Sorry yes I was not very clear, I will clarify:
With the source code unedited, when I ran:
h = ta.Scan(X_train, Y_train, x_val=X_dev, y_val=Y_dev, params=p, dataset_name="debug", experiment_no="1", model=keras_nn_model_talos, grid_downsample=0.002, talos_log_name="talos.log", reduction_method="spear", reduction_metric="fbeta_score")
I obtained the following stack trace:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-9-b4cbea7ca6f1> in <module>()
8 'second_GRU_layer':[True, False]}
9 h = ta.Scan(X_train, Y_train, x_val=X_dev, y_val=Y_dev, params=p, dataset_name="debug", experiment_no="1",
---> 10 model=keras_nn_model_talos, grid_downsample=0.002, talos_log_name="talos.log", reduction_method="spear", reduction_metric="fbeta_score")
11
12 ## I had to edit a line of ~/anaconda3/envs/tfgpu-keras/lib/python3.6/site-packages/talos/metrics/score_model.py
~/anaconda3/envs/tfgpu-keras/lib/python3.6/site-packages/talos/scan/Scan.py in __init__(self, x, y, params, dataset_name, experiment_no, model, x_val, y_val, val_split, shuffle, search_method, reduction_method, reduction_interval, reduction_window, grid_downsample, reduction_threshold, reduction_metric, round_limit, talos_log_name, debug, seed, clear_tf_session, disable_progress_bar)
140 # input parameters section ends
141
--> 142 self._null = self.runtime()
143
144 def runtime(self):
~/anaconda3/envs/tfgpu-keras/lib/python3.6/site-packages/talos/scan/Scan.py in runtime(self)
145
146 self = scan_prepare(self)
--> 147 self = scan_run(self)
~/anaconda3/envs/tfgpu-keras/lib/python3.6/site-packages/talos/scan/scan_run.py in scan_run(self)
27 disable=self.disable_progress_bar)
28 while len(self.param_log) != 0:
---> 29 self = rounds_run(self)
30 self.pbar.update(1)
31 self.pbar.close()
~/anaconda3/envs/tfgpu-keras/lib/python3.6/site-packages/talos/scan/scan_run.py in rounds_run(self)
59
60 _hr_out = run_round_results(self, _hr_out)
---> 61 self._val_score = get_score(self)
62 write_log(self)
63 self.result.append(_hr_out)
~/anaconda3/envs/tfgpu-keras/lib/python3.6/site-packages/talos/metrics/score_model.py in get_score(self)
15
16 try:
---> 17 y_pred = self.keras_model.predict_classes(self.x_val)
18 # y_pred = self.keras_model.predict(self.x_val)
19 return Performance(y_pred, self.y_val, self.shape, self.y_max).result
AttributeError: 'Model' object has no attribute 'predict_classes'
This issue is unrelated to x_val, y_val, as the following code still produces it:
h = ta.Scan(X_train, Y_train, params=p, dataset_name="debug", experiment_no="1",
model=keras_nn_model_talos, grid_downsample=0.002, talos_log_name="talos.log")
I only mentioned that because that is the reason I am using the development branch, as that feature is not yet in the main branch. Sorry to confuse. I fixed that code by changing predict_classes to predict, which was based on a forum post I read somewhere.
My model is as follows:
def keras_nn_model_talos(x_train, y_train, x_val, y_val, params):
X_input = Input(shape = x_train.shape[1:])
# Step 1: CONV layer
X = Conv1D(filters=int(params["num_filters"]), kernel_size=15,strides=4)(X_input) # CONV1D
X = BatchNormalization()(X) # Batch normalization
X = Activation('relu')(X) # ReLu activation
X = Dropout(rate=params["dropout_rate"])(X) # dropout (use 0.8)
if params["second_GRU_layer"]:
# Step 2: First GRU Layer
X = GRU(units = int(params["gru_hidden_units"]), return_sequences = True)(X) # GRU (use 128 units and return the sequences)
X = Dropout(rate=params["dropout_rate"])(X) # dropout (use 0.8)
X = BatchNormalization()(X) # Batch normalization
# Step 3: Second GRU Layer
X = GRU(units = int(params["gru_hidden_units"]), return_sequences = True)(X) # GRU (use 128 units and return the sequences)
X = Dropout(rate=params["dropout_rate"])(X) # dropout (use 0.8)
X = BatchNormalization()(X) # Batch normalization
X = Dropout(rate=params["dropout_rate"])(X) # dropout (use 0.8)
# Step 4: Time-distributed dense layer
X = TimeDistributed(Dense(1, activation = "sigmoid"))(X) # time distributed (sigmoid)
model = Model(inputs = X_input, outputs = X)
opt = Adam(lr=params["adam_learning_rate"], beta_1=0.9, beta_2=0.999, decay=0.01)
from talos.metrics.keras_metrics import fbeta_score
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=[fbeta_score]) #"acc", my_recall, my_precision, f1
history = model.fit(x_train, y_train, batch_size = int(params["batch_size"]),
validation_data=(x_val, y_val),
epochs=int(params["epochs"]))
return history, model
With regard to issue #3 I also find it odd that Keras implemented only a batch-wise F1 score, and their solution to implementing it on a per-epoch level was to throw in in the garbage can entirely. I would have though about 50% of keras users need an F1 score at somepoint.
Would I be right in saying that, in theory atleast, setting reduction_metric="fbeta_score"
should produce a final .csv file with each row showing a parameter combination and the respective scores for the epoch which had highest fbeta_score?
@bml1g12 To answer the question about reduction_metric first, to do this in your model.compile you have to call fbeta_score (I can see you are doing) as metric. reduction_metric is for purpose of the optimization algorithm (other than random). This should yield what you are looking for.
That said, are you reporting that you had validated that actually this is not the case and instead in your experiment .csv you get fbeta_score for the first epoch of each permutation? If that's the case, it should be a very simple fix.
ps. I'm also baffled by the decision to just give up on F1 score as opposed to dealing with it.
I see so I have it provided with the correct argument at least (albeit reduction_metric
not being necessary).
Exactly, I instead get seemingly only the first epoch of each permutation (and definitely not the highest fbeta_score). A long-shot, but if Talos is set to report the minimum fbeta_score then maybe that is the reason for this bug.
If I understand correctly, the intended behavior is Talos saves the "best" value of the the metric across all epochs within the permutation to a CSV, usually "accuracy". But if you supply several metrics, how does Talos decide which metric is the one it should be using for this purpose?
I would actually like to select based on val_fbeta_score (validation result). Here is an example output for:
p = {'adam_learning_rate': [0.01, 0.001, 0.0001],
'num_filters': [12, 32, 64, 196],
'gru_hidden_units':[32, 64, 128, 196],
'dropout_rate':[0.2,0.5,0.8],
'batch_size': [64, 128, 256],
'epochs': [10],
'second_GRU_layer':[True, False]}
h = ta.Scan(X_train, Y_train, params=p, dataset_name="debug", experiment_no="1",
model=keras_nn_model_talos, grid_downsample=0.002, talos_log_name="talos.log", reduction_method="spear", reduction_metric="val_fbeta_score")
Train on 1260 samples, validate on 540 samples
Epoch 1/10
1260/1260 [==============================] - 3s 2ms/step - loss: 0.8298 - fbeta_score: 0.2810 - val_loss: 0.8771 - val_fbeta_score: 0.3295
Epoch 2/10
1260/1260 [==============================] - 1s 644us/step - loss: 0.7233 - fbeta_score: 0.3322 - val_loss: 0.7126 - val_fbeta_score: 0.3814
Epoch 3/10
1260/1260 [==============================] - 1s 646us/step - loss: 0.6889 - fbeta_score: 0.3599 - val_loss: 0.7488 - val_fbeta_score: 0.3750
Epoch 4/10
1260/1260 [==============================] - 1s 646us/step - loss: 0.6602 - fbeta_score: 0.3924 - val_loss: 0.8280 - val_fbeta_score: 0.3586
Epoch 5/10
1260/1260 [==============================] - 1s 673us/step - loss: 0.6361 - fbeta_score: 0.4247 - val_loss: 0.7142 - val_fbeta_score: 0.4258
Epoch 6/10
1260/1260 [==============================] - 1s 672us/step - loss: 0.6119 - fbeta_score: 0.4499 - val_loss: 0.6764 - val_fbeta_score: 0.4617
Epoch 7/10
1260/1260 [==============================] - 1s 703us/step - loss: 0.5913 - fbeta_score: 0.4703 - val_loss: 0.6412 - val_fbeta_score: 0.4842
Epoch 8/10
1260/1260 [==============================] - 1s 654us/step - loss: 0.5705 - fbeta_score: 0.4892 - val_loss: 0.5530 - val_fbeta_score: 0.5617
Epoch 9/10
1260/1260 [==============================] - 1s 635us/step - loss: 0.5515 - fbeta_score: 0.5144 - val_loss: 0.4873 - val_fbeta_score: 0.6037
Epoch 10/10
1260/1260 [==============================] - 1s 638us/step - loss: 0.5339 - fbeta_score: 0.5344 - val_loss: 0.5228 - val_fbeta_score: 0.5852
100%|ββββββββββ| 1/1 [00:13<00:00, 13.56s/it]
Scan Finished!
round_epochs | val_loss | val_fbeta_score | loss | fbeta_score | adam_learning_rate | num_filters | gru_hidden_units | dropout_rate | batch_size | epochs | second_GRU_layer |
---|---|---|---|---|---|---|---|---|---|---|---|
10 | 0.5227934398033 | 0.329458241992527 | 0.533916110462613 | 0.281042905270107 | 0.001 | 196 | 32 | 0.2 | 256 | 10 | 1 |
Ah ok I understand now. Glad to know the manual input of validation sets was working properly.
Regarding your last question, this is related to #54 (I think) and I am going to post an easy fix for it. I presume you just want to essentially sort the output dataframe by the val_fbeta_score
, correct?
Posted the answer to what I think your question was in #54. Let me know if that was helpful!
I presume you just want to essentially sort the output dataframe by the val_fbeta_score, correct?
Yes, that's right.
I think your explanation in #54 is related to how to sort the resulting table by a chosen metric, but as I understand it, currently whether a row even exists in the table or not depends on what metric is being selected as "the best" for that parameter. I may be misunderstanding: I am currently assuming the object produced by ta.Reporting
does not save every epoch's metric result but instead stores the "best" across all epochs?
i.e. if we do a 2 epoch run where each run showed this output in keras: Parameter 1: epoch 1: accuracy: 0.2, val_fbeta_score, 0.2 epoch 2: accuracy: 0.3, val_fbeta_score 0.1 If I understand correctly, it will currently output in the final CSV either a row like: A) Parameter 1, accuracy:0.3, val_fbeta_score:0.1 OR B) Parameter 1, accuracy:0.2, val_fbeta_score:0.2
So my question is, how do I tell it to produce B) and not A)? Regardless of how the output itself is column-sorted.
@bml1g12 Ah I understand. I believe the output you see is the final epoch's result, so you will not see the result of the first epoch, only the second. Currently to my knowledge we do not save every epoch's results. That said, you can implement early stopping criteria if you want, but I don't recommend it (neither does Andrew Ng) since you have no way of knowing how that will impact your overall end (testing set) result.
In principle, we should report a statistical average over many executions evaluated on the validation set. This is a planned feature (discussed in #40, #18) adds an extra order of computational complexity to the calculation. It is an expensive albeit necessary part of the plan for Talos I would say.
So are you looking for the history so to speak? That might be new-issue-worthy.
I see thank you, I misunderstood - I see now Talos currently selects what to store based the last epoch (i.e. chronology) not "best loss/ best accuracy " etc.
So this issue (#56) is then relating to the fact that in my test, it does not produce the last epoch's val_fbeta_score (shown in my post above with the output pasted). Do you know any solution for this?
(With regards to the statistical average, that seems like a great feature in case you get an anomalous change in model performance the last epoch. I agree with Andrew Ng that early stopping should generally be avoided: equally a history would allow users to explore the effect of the number of epochs on model performance without needing to run Talos with extra permutations for Epochs. )
Ah. Yeah that's a problem. I will try to look into this when I can. If you have any suggestions as well feel free to comment!
The problem is here I think. This is precisely what I think @mikkokotila was talking about in #3. The problem it seems is that the F1 score is not implemented at the epoch level. I still need to dig some more to figure out why this is returning the first value for the F1 score and not any other one...
Thank you.
I'm not sure it is a problem with F1 score itself, as it produces sensible results from the output of Keras per-epoch, it seems to me that Talos just isn't saving the right value. It would be painful to do, but I could work around this issue by printing all keras output to a file and using grep
to obtain the last result of each epoch, then stitching it back together with the parameters reported using Talos.
But I am unfamiliar with the source code so maybe it is something more fundamental.
Ah I'm sorry. You're totally right. The lack of sleep is getting to me! I'll get on this sometime soon when I have more energy.
I'm not sure if Talos is actually using the Keras metric to generate the F1 score you see in the pandas output or if its using its own. This needs to be consistent in the future.
Anyway, I will look into this at some point. I appreciate you bringing this to our attention since it is a very real problem that needs fixing.
Thank you!
@bml1g12 regarding "A long-shot, but if Talos is set to report the minimum fbeta_score then maybe that is the reason for this bug."
You got it. That's it. There is one hard coded remnant (I hope last one) from the beginning of the package where it takes minimum value unless the word 'acc' is in the metric in the history object. I will fix that ASAP as it's kind of silly. Creating a new issue for it.
This is now resolved in daily-dev with more info on the closed #62.
Sorry for causing doubt with the bad decision I had made previously. Unfortunately the resolved situation is not much more intuitive i.e. we would have to include the string 'acc' into any custom metrics that are added to Talos, and then the user would have to do the same for their own custom metrics. This should be ok though, as it is in accord with Keras convention of using the _acc postfix (at least in Keras 2 this seems to be the case).
I will leave this open for a bit in case I missed something.
OK I'll give it a try, so I should append "_acc" to any custom metric name, got it. To clarify, what would be the consequence of not appending _acc? Simply that Talos takes the lowest value I guess?
As long as the documentation explains this, it isn't too unintuitive at least.
@bml1g12 Great :) You are right, the consequence is that it will be treated as a something to be minimized i.e. the lowest value will be given instead. I also considered the possibility of showing min, peak, and max, but that would mean 3 times more columns, out of which in most cases 1 or 2 would be noise.
OK I understand.
I think #62 may still be unresolved because x94carbone said above "I believe the output you see is the final epoch's result" which I took to mean that the intended behavior was to select the last epochs value.
I just tested it and it now seems to produce the highest value of each _acc metric across all epochs, as opposed to the last epoch's value.
Selecting the best value across all epochs can be a little confusing if you have more than one metric, because if one column of the table is from a different epoch as another column of the table, then you are essentially not comparing apples with apples. Also it not clear how many epochs it took to obtain the value displayed, as it is different for each column.
So I need to ask, what is the intended behavior? a) For each metric, store the result of the last epoch b) For each metric, store its best across all epochs? <--- seemingly current behavior c) Given a metric of interest, find the epoch that performed best on that epoch, and store all metric values for that particular epoch (along with the epoch number).
I think (b) Is useful if you are doing early stopping, but given the epoch that produced the result can't be obtained currently, it would need this information for each metric. (a) is simple to interpret at least, and what x94carbone seemed to think it should be. And (c) is what I would personally find most useful I think.
In an ideal world, the user could select between method (A, B or C), but I can appreciate it might not be worth coding that. If the current behavior is the intended behavior, then I think it is crucial that a history is kept so the user can figure out which epoch produced the result listed (and thereby reproduce it).
What happens is that for each metric, the best across all epochs is stored in the experiment log. I like the idea about allowing the user to choose what they want to store, as some might want the last for some reason. Also I think that the point you are making about the approach for peak being confusion, do you think it would be enough to allow the user to set this in Scan() parameters, or should we show both peak and last?
I don't understand 'c' though, could you clarify?
And I apologize for the confusions, it has to do with the rather cryptic way the related part of Talos is handled. Why it's cryptic is pretty important though; it allows us to avoid hardcoding hyperparameter names anywhere, and instead allow the user to add any which they like to use.
I see thank you, so current behavior is method (b) and it is the intended behavior. I think the issue can be closed once the documentation explains the current behavior.
I think an option to show "peak across epochs" or "last epoch" in Scan() would be great. Whatever the case, it would be good if the documentation makes it crystal clear which parameters are stored by default as I think most users would assume each row of the table corresponds to a single "model", i.e. most users would currently erroneously assume that a row of output .csv shown by Talos could be generated by a specific epoch of a Keras model, when in reality currently each row potentially shows a mixture of different epoch results.
To explain (c) by example, imagine you are interested in obtaining optimal F1 validation score, with the sub-criteria that the precision must be >0.1. Currently this task is not possible using (a) or (b), because neither will show the precision along with the optimal F1 score for the same epoch. Method (c) involves Talos first identifying the epoch with optimal F1 score, then storing the corresponding metrics for that particular epoch.
I like the idea about allowing the user to choose what they want to store, as some might want the last for some reason.
@mikkokotila This is definitely a critical option since the last value is most representative of the final trained state of the model and therefore the best indicator of how the model will generalize to the reported evaluation on the testing set.
most users would currently erroneously assume that a row of output .csv shown by Talos could be generated by a specific epoch of a Keras model
Thank you, @bml1g12 for pointing this out. This is not something I would have considered before you mentioning it! I didn't realize this wasn't clear. We should fix this in the documentation.
The issue is in utils\results.py line 31. The returned results is the smallest among epochs, not the largest one, as it should be. I corrected the conditions for my application, it works as intended.
for key in out.history.keys(): t_t = array(out.history[key]) if (key=='val_acc') or (key == 'acc') or (key == 'fmeasure') or (key == 'val_fmeasure'): peak_epoch = argpartition(t_t, len(t_t) - 1)[-1]
else: peak_epoch = argpartition(t_t, len(t_t) - 1)[0] peak = array(out.history[key])[peak_epoch] _rr_out.append(peak) p_epochs.append(peak_epoch)
it does the job, but not very elegant
This is now fixed in 458f973
and is available in dev. Note that as it stands we get the max epoch, so the last round option will have to be implemented separately. Creating a new issue for that.
Big thanks for everyone here. Closing now.
This is handled in PR #80 and also takes care of #73. Thanks again! :)
When I import
from talos.metrics.keras_metrics import fbeta_score
and compile the model with this metric, then run talos with the parameterreduction_metric="fbeta_score"
the output csv seems to list the the val_acc of the best epoch for val_acc, but only the first epoch's value for fbeta_score. This seems like something is going wrong, as if anything it should be producing the corresponding fbeta_score for that epoch I would have though.I am not interested in accuracy due to class imbalance in my system, and the accuracy saturates after a few epochs, so I need to talos to store for each parameter combination either:
a) The result of the last epoch b) Ideally the result with the best fbeta_score
Given that fbeta_score has been implemented, I assume this must be possible but I don't see how.
I am using the latest dev branch v0.2 (as I have augmented data, I needed the functionality to supply x_val and y_val as parameters). In order to run this code without bugs, I needed to change;
talos/metrics/score_model.py
line 17 fromy_pred = self.keras_model.predict_classes(self.x_val)
toy_pred = self.keras_model.predict(self.x_val)
Which might be related to my problem.