g-pickardou commented 3 years ago

I am trying to translate a working Python XOR example to TensorFlow.NET. Both versions are a very short few liners, hopefully just a matter of copy and paste to reproduce (supposing a TensorFlow 2.4.1 environment is already set up)

The translated C# is compiles and runs, but works way differently. When using the loss function keras.losses.MeanSquaredError() the C# model does not work at all (loss is increasing, while the very same Python code, with the very same defaults learns pretty well). When using keras.losses.MeanAbsoluteError() the C# version earns, but loss decreasing ten times slower compared to the Python version.

Regardless the first layer is 32, 64 or 1024 and regardless the epoch is 100 or 1000 the very same difference between occurs the Python and C# implementations. Python works even with first layer 32 units and epoch 100, and in the C# version the loss is increasing.

Please note, I am not asking, why C# model is not working, I am asking what issue causes the different behavior.

What I've done and checked so far:

Using the very same TensorFlow versions in both code: 2.4.1
Checked the 'adam' optimizer's default settings (learning rate, etc), those are identical both in the Python version and C# version
With multiple runs (dozen times) both the Python and C# versions behave consistent to themself, with minor differences in loss and minor differences in predictions. However the difference between a Python and C# version is practically learns vs does not learn. See the results attached.
Changing the batch size 1 vs 4 does not change the result (just the perfomance), which is expected.
Comparing the results in case of mean absolute error loss function (see end of the issue) it is interesting that even the C# version has ten times greater loss compared to the Python, the predictions are actually better.
Regarding the previous point, and the fact that in case of mean squared error loss function the C# version does not work at all, my journey man bet would be something about the loss function implementation or something very close related, but to be honest, I am in the NN business no more than 2 days.

Using the latest nuget packages:

<PackageReference Include="SciSharp.TensorFlow.Redist" Version="2.4.1" />
<PackageReference Include="TensorFlow.Keras" Version="0.5.0" />
<PackageReference Include="TensorFlow.NET" Version="0.40.0" />

Python sample code


import numpy as np
from keras.models import Sequential
from keras.layers.core import Activation, Dense
from keras.layers import Input

training_data = np.array([[0,0],[0,1],[1,0],[1,1]], "float32") target_data = np.array([[0],[1],[1],[0]], "float32")

model = Sequential() model.add(Input((2,))) model.add(Dense(64, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy']) model.fit(training_data, target_data, epochs=1000 , verbose=1)

model.summary()

print(model.predict(training_data))

**C# translated sample code**
```csharp
using NumSharp;
using static Tensorflow.Binding;
using static Tensorflow.KerasApi;
// ...
var trainingData = np.array((Array) new float[,] {{0, 0}, {0, 1}, {1, 0}, {1, 1}});
var targetData = np.array((Array) new float[,] {{0}, {1}, {1}, {0}});

var model = keras.Sequential();
model.add(keras.Input(2));
model.add(keras.layers.Dense(64, keras.activations.Relu));
model.add(keras.layers.Dense(1, keras.activations.Sigmoid));
model.compile(keras.losses.MeanSquaredError(), keras.optimizers.Adam(), new[] {"accuracy"});
model.fit(trainingData, targetData, 1, 1000);
print(model.predict(trainingData, 4));

Results with the exact sample programs

Python with 'mean_squared_error': start loss: 0.4953, accuracy: 0.7500 end loss: 0.0046, accuracy: 1,000000 prediction: (correct XOR) [[0.07034078] [0.93302286] [0.9332601 ] [0.06674343]]

C# with keras.losses.MeanSquaredError(): start loss: 0,254314, accuracy: 1,000000 end loss: 0,461851, accuracy: 1,000000 prediction: (incorrect XOR) [[0,036265105], [0,005444467], [0,0042176247], [0,00059726834]]

Results with the exact sample programs, except using absolute error instead square error

Python with 'mean_absolute_error': start loss: loss: 0.4830 - accuracy: 0.7500 end loss: 0.0141 - accuracy: 1.0000 prediction: (correct XOR) [[0.01455569] [0.98603207] [0.98603785] [0.01395237]]

C# with keras.losses.MeanAbsoluteError(): Note: Although this case the loss decreases, but still 10 times greater than in the Python version Interestingly the res start loss: 0,501608, accuracy: 1,000000 end loss: 0,185201, accuracy: 1,000000 prediction: (correct XOR, actually better than the Python version) [[0,010161787], [0,9958803], [0,99654853], [0,0029382408]]

Oceania2018 commented 3 years ago

@g-pickardou Thank you for your elaborated description. The bug is fixed in the new source code. We'll release it this weekend.

g-pickardou commented 3 years ago

Many thanks, that was fast. Could you please point to the commit, which is the fix, just for curiosity.

Oceania2018 commented 3 years ago

Should be this fix if I don’t remember wrong: https://github.com/SciSharp/TensorFlow.NET/commit/f0030ca9bb407c66c2767b7ea445b3c531b0cef5

SciSharp / TensorFlow.NET

Very simple working Python XOR sample translated to identical TensorFlow.NET does not work at all or behaves very differently #806

model.summary()

Results with the exact sample programs

Results with the exact sample programs, except using absolute error instead square error