Open AdrienDeverin opened 4 months ago
Hi, the losses in the two logs seem to have little difference. In the second log, only the last step has an increasing loss. What is the loss like after that?
Are you using the debug mode or release mode? If you are in debug mode, will the result differ under release mode?
In the second log, loss keep increasing every time (I didn't show it for more epoch but it's like I said in the thread). Moreover, I see mainly the val_loss as confirmation of performances.
I was in debug mode, but the result is the same in release mode (I check it).
I have no idea about it yet. model.compile
is just to gather all the information as a preparation for graph building. What if put the following code in the main thread
model = keras.models.load_model(folder);
IOptimizer optimizer = keras.optimizers.Adam(0.01f);
ILossFunc loss = keras.losses.BinaryCrossentropy();
but keep model.compile
in the separated thread along with model.fit
?
It's exactly what I did, in fact, and the problem is the same as in situation 2 (it will not converge).
To summarize: If model.compile and model.fit are in the same thread, it will not converge. However, if model.fit is in another thread, it will not cause any problems... Really strange...
I'll add another strange observation. I tried another example, the one from the TensorFlow.NET GitHub page, and got the opposite result!
Below is the code used:
// Load data
var ((x_train, y_train), (x_test, y_test)) = keras.datasets.cifar10.load_data();
x_train = x_train / 255.0f;
// Construct Model (could be replace by a the load_model() function, it will have the same impact)
var layers = keras.layers;
var inputs = keras.Input(shape: (32, 32, 3), name: "img");
var x = layers.Conv2D(32, 3, activation: "relu").Apply(inputs);
x = layers.Conv2D(64, 3, activation: "relu").Apply(x);
var block_1_output = layers.MaxPooling2D(3).Apply(x);
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(block_1_output);
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(x);
var block_2_output = layers.Add().Apply(new Tensors(x, block_1_output));
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(block_2_output);
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(x);
var block_3_output = layers.Add().Apply(new Tensors(x, block_2_output));
x = layers.Conv2D(64, 3, activation: "relu").Apply(block_3_output);
x = layers.GlobalAveragePooling2D().Apply(x);
x = layers.Dense(256, activation: "relu").Apply(x);
x = layers.Dropout(0.5f).Apply(x);
var outputs = layers.Dense(10).Apply(x);
var model = keras.Model(inputs, outputs, name: "toy_resnet");
// Compile
model.compile(optimizer: keras.optimizers.RMSprop(1e-3f), loss: keras.losses.SparseCategoricalCrossentropy(from_logits: true), metrics: new[] { "acc" });
Thread t = new Thread(() =>
{
model.fit(x_train[new Slice(0, 2000)], y_train[new Slice(0, 2000)],
batch_size: 64,
epochs: 30,
validation_split: 0.2f);
});
t.SetApartmentState(ApartmentState.STA);
t.Start();
This example, similar to example 1 in my thread, DOESN'T CONVERGE!
Contrarily, this code, similar to my example 2, converges:
// Load data
var ((x_train, y_train), (x_test, y_test)) = keras.datasets.cifar10.load_data();
x_train = x_train / 255.0f;
Thread t = new Thread(() =>
{
// Construct Model (could be replace by a the load_model() function, it will have the same impact)
var layers = keras.layers;
var inputs = keras.Input(shape: (32, 32, 3), name: "img");
var x = layers.Conv2D(32, 3, activation: "relu").Apply(inputs);
x = layers.Conv2D(64, 3, activation: "relu").Apply(x);
var block_1_output = layers.MaxPooling2D(3).Apply(x);
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(block_1_output);
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(x);
var block_2_output = layers.Add().Apply(new Tensors(x, block_1_output));
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(block_2_output);
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(x);
var block_3_output = layers.Add().Apply(new Tensors(x, block_2_output));
x = layers.Conv2D(64, 3, activation: "relu").Apply(block_3_output);
x = layers.GlobalAveragePooling2D().Apply(x);
x = layers.Dense(256, activation: "relu").Apply(x);
x = layers.Dropout(0.5f).Apply(x);
var outputs = layers.Dense(10).Apply(x);
var model = keras.Model(inputs, outputs, name: "toy_resnet");
// Compile
model.compile(optimizer: keras.optimizers.RMSprop(1e-3f), loss: keras.losses.SparseCategoricalCrossentropy(from_logits: true), metrics: new[] { "acc" });
model.fit(x_train[new Slice(0, 2000)], y_train[new Slice(0, 2000)],
batch_size: 64,
epochs: 30,
validation_split: 0.2f);
});
t.SetApartmentState(ApartmentState.STA);
t.Start();
This reveals a convergence issue opposite to the one I initially described in this problem. Furthermore, in this exemple, the more the graph is set in the same thread to the model.fit() code, the more the convergence will occur.
To support this, consider situation 3 (a mix of 1 and 2):
// Load data
var ((x_train, y_train), (x_test, y_test)) = keras.datasets.cifar10.load_data();
x_train = x_train / 255.0f;
// Construct Model (could be replace by a the load_model() function, it will have the same impact)
var layers = keras.layers;
var inputs = keras.Input(shape: (32, 32, 3), name: "img");
var x = layers.Conv2D(32, 3, activation: "relu").Apply(inputs);
x = layers.Conv2D(64, 3, activation: "relu").Apply(x);
var block_1_output = layers.MaxPooling2D(3).Apply(x);
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(block_1_output);
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(x);
var block_2_output = layers.Add().Apply(new Tensors(x, block_1_output));
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(block_2_output);
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(x);
Thread t = new Thread(() =>
{
var block_3_output = layers.Add().Apply(new Tensors(x, block_2_output));
x = layers.Conv2D(64, 3, activation: "relu").Apply(block_3_output);
x = layers.GlobalAveragePooling2D().Apply(x);
x = layers.Dense(256, activation: "relu").Apply(x);
x = layers.Dropout(0.5f).Apply(x);
var outputs = layers.Dense(10).Apply(x);
var model = keras.Model(inputs, outputs, name: "toy_resnet");
// Compile
model.compile(optimizer: keras.optimizers.RMSprop(1e-3f), loss: keras.losses.SparseCategoricalCrossentropy(from_logits: true), metrics: new[] { "acc" });
model.fit(x_train[new Slice(0, 2000)], y_train[new Slice(0, 2000)],
batch_size: 64,
epochs: 30,
validation_split: 0.2f);
});
t.SetApartmentState(ApartmentState.STA);
t.Start();
This produces a better loss (acc = 0.3) than code 1 (acc = 0.1) but less than code 2 (acc = 0.45). This behavior is totally deterministic (not just one test has been done).
Edit : But all of this have maybe an impact due to the optimizer used. I replace RSMprop by Adam and this exemple converge in case 1,2,3.
I can't explain this behavior, which is totally opposite to my original problem but also indicates an underlying problem with the threads..
That's so weird! What's your code like after t.Start()
?
@AsakusaRinne there are litterally nothing
Brief Description
I've encountered a perplexing issue while utilizing Keras and its fit() function to train a standard CNN.
To illustrate, consider the following code snippet where the model learns and converges successfully:
However, when the model.compile() and model.fit() functions are executed sequentially in the same thread, the model seemingly learns but fails to converge, as demonstrated below:
Remarkably, the convergence issue appears to be entirely deterministic (used obviously over the same dataset) between these two examples. The logs are send below. Despite thorough investigation, I've failed to identify any critical differences. All variables remain consistent.
Specifically, it seems that the function model.compile() must not be executed in the same thread as the fit() function for successful convergence. I don't understand why...
Any insights or suggestions on this peculiar behavior would be greatly appreciated. Datas used can be given.
Best regards, DEVERIN Adrien
Device and Context
Used on CPU
Benchmark
Logs example 1 (convergence working) :
ect ...
Logs example 2 (convergence doesn't work) :
Alternatives
No response