Not able to train a neural network [XOR added]

agarwl commented 7 years ago

Environment info

Operating System: Ubuntu 16.04 Compiler: g++ Package used (Python/R/Scala/Julia): C++

MXNet version: 0.11

Or if installed from source: MXNet commit hash (git rev-parse HEAD): branch 0.11.0

Error Message:

There is no error but the training accuracy always remains ZERO. Is there a problem with the code for constructing the neural network (the siamese symbol) or/and is the training procedure correct, i.e. the gradient updates are correctly specified?

Minimum reproducible example

I defined a siamese net like architecture with LogisticRegressionOutput as final layer using the code provided below:

#include <iostream>
#include <map>
#include <string>
#include "mxnet-cpp/MxNetCpp.h"
#include "../include/mxnet-cpp/op.h"

using namespace std;
using namespace mxnet::cpp;

Symbol mlp(const vector<int> &layers, const vector<Symbol> & weights,
            const std::vector<Symbol> & biases, const string & inp_name )
{
  auto x = Symbol::Variable(inp_name);
  vector<Symbol> outputs(layers.size());

  for (size_t i = 0; i < layers.size(); ++i)
  {
    string istr = to_string(i);
    Symbol fc = FullyConnected(
      i == 0? x : outputs[i-1],  // data
      weights[i],
      biases[i],
      layers[i]);
    outputs[i] = i == layers.size()-1 ? fc : LeakyReLU(string("act") + istr, fc, 
    LeakyReLUActType::kLeaky);
  }
  return outputs.back();
}

int main(int argc, char** argv)
{
    const int feature_size = 2;
    const vector<int> layers{32, 16, 1};
    const int batch_size = 100;
    const int max_epoch = 10;
    const float learning_rate = 0.01;
    const float weight_decay = 1e-2;

    auto ctx = Context::gpu(); // Use GPU for training
    auto ctx_cpu = Context::cpu();

    vector<Symbol> weights(layers.size());
    vector<Symbol> biases(layers.size());

    for (size_t i = 0; i < layers.size(); ++i)
    {
        string istr = to_string(i);
        weights[i] = Symbol::Variable("w" + istr);
        biases[i] = Symbol::Variable("b" + istr);
    }

    // CONSTRUCT the network
    /**********************************************************************/
    auto Net = mlp(layers, weights, biases, "X");
    auto Net2 = mlp(layers, weights, biases, "X1");

    auto sym_label = Symbol::Variable("label");
    auto siamese = LogisticRegressionOutput(string("sigmoid"), Net - Net2, sym_label);

    map<string, NDArray> args_map;
    map<string, NDArray> aux_map;

    /*we should tell mxnet the shape of data and label*/
    args_map["X"] = NDArray(Shape(batch_size, feature_size) , ctx);
    args_map["X1"] = NDArray(Shape(batch_size, feature_size) , ctx);
    args_map["label"] = NDArray(Shape(batch_size, 1), ctx);

    /*with data and label, executor can be generated automatically*/
    auto *exec = siamese.SimpleBind(ctx, args_map);
    siamese.InferArgsMap(ctx, &args_map, args_map);
    auto arg_names = siamese.ListArguments();
    /**********************************************************************/

    Xavier xavier = Xavier(Xavier::gaussian, Xavier::in, 2.34);
    for (auto &arg : args_map)
    {
        if (arg.first == "X" || arg.first == "X1" || arg.first == "label") continue;
        xavier(arg.first, &arg.second);
    }

    Optimizer* opt = OptimizerRegistry::Find("adam");
    // opt->SetParam("rescale_grad", 1.0 / batch_size)
        opt->SetParam("lr", learning_rate)
        ->SetParam("wd", weight_decay);

     /** DATA SETUP **/
    /**********************************************************************/
    int data_count = batch_size * 100;
    mx_float* aptr_x = new mx_float[data_count * 2];
    mx_float* aptr_x1 = new mx_float[data_count * 2];
    mx_float* aptr_y = new mx_float[data_count];

    // we make the data by hand, in 10 classes, with some pattern
    for (int i = 0; i < data_count; i++)
    {
        for (int j = 0; j < 2; j++)
        {
          aptr_x[i * 2 + j] = (i%100 + j) *1.0f;
          aptr_x1[i * 2 + j] = -(i%100 + j) *1.0f;
        }
        aptr_y[i] = 1;
    }

    NDArray data_array = NDArray(Shape(data_count, 2), ctx_cpu, false);  
    NDArray data1_array = NDArray(Shape(data_count, 2), ctx_cpu, false);  
    NDArray label_array = NDArray(Shape(data_count), ctx_cpu, false);
    data_array.SyncCopyFromCPU(aptr_x, data_count * 2);
    data1_array.SyncCopyFromCPU(aptr_x1, data_count*2);
    label_array.SyncCopyFromCPU(aptr_y, data_count);
    data_array.WaitToRead();
    data1_array.WaitToRead();
    label_array.WaitToRead();

    int val_fold = 1;
    size_t train_num = data_count * (1 - val_fold / 10.0);
    NDArray train_data, train1_data, val_data, val1_data;
    NDArray train_label, val_label;
    train_data = data_array.Slice(0, train_num);
    train1_data = data1_array.Slice(0, train_num);
    train_label = label_array.Slice(0, train_num);
    /**********************************************************************/

    // TRAINING the network
    Accuracy acu_train;
    for (int ITER = 0; ITER < max_epoch*100; ++ITER)
    {
        size_t start_index = 0;
        /*reset the metric every epoch*/
        acu_train.Reset();
        while (start_index < train_num)
        {
            if (start_index + batch_size > train_num)
            {
              start_index = train_num - batch_size;
            }
            args_map["X"] =
                train_data.Slice(start_index, start_index + batch_size).Copy(ctx);
            args_map["X1"] =
                train1_data.Slice(start_index, start_index + batch_size).Copy(ctx);
            args_map["label"] =
                train_label.Slice(start_index, start_index + batch_size).Copy(ctx);
            start_index += batch_size;
            NDArray::WaitAll();

            exec->Forward(true);
            exec->Backward();
            // Update parameters
            for (size_t i = 0; i < arg_names.size(); ++i)
            {
                if (arg_names[i] == "X" || arg_names[i] == "X1" || arg_names[i] == "label") continue;
                opt->Update(i, exec->arg_arrays[i], exec->grad_arrays[i]);
            }
            NDArray::WaitAll();
            acu_train.Update(args_map["label"], exec->outputs[0]);
        }
        LG << "ITER: " << ITER << " Train Accuracy: " << acu_train.Get();
    }

    delete exec;
    delete [] aptr_x;
    delete [] aptr_x1;
    delete [] aptr_y;
    MXNotifyShutdown();
    return 0;
 }

The output of the code is always something like:

[18:43:06] neural_net_eval.cpp:172: ITER: 0 Train Accuracy: 0
[18:43:06] neural_net_eval.cpp:172: ITER: 1 Train Accuracy: 0
[18:43:06] neural_net_eval.cpp:172: ITER: 2 Train Accuracy: 0
[18:43:06] neural_net_eval.cpp:172: ITER: 3 Train Accuracy: 0
[18:43:06] neural_net_eval.cpp:172: ITER: 4 Train Accuracy: 0
[18:43:06] neural_net_eval.cpp:172: ITER: 5 Train Accuracy: 0
[18:43:06] neural_net_eval.cpp:172: ITER: 6 Train Accuracy: 0

What have you tried to solve it?

Tried different variations on training data but the network is always predicting ZERO as its output.

agarwl commented 7 years ago

I am adding a very simple example here in which I tried to fit a neural network on the XOR function but unable to do so as well.

#include <iostream>
#include <map>
#include <string>
#include "mxnet-cpp/MxNetCpp.h"
// Allow IDE to parse the types
#include "../include/mxnet-cpp/op.h"

using namespace std;
using namespace mxnet::cpp;

Symbol mlp(const vector<int> &layers, const vector<Symbol> & weights,
            const std::vector<Symbol> & biases, const string & inp_name )
{
  auto x = Symbol::Variable(inp_name);

  vector<Symbol> outputs(layers.size());

  for (size_t i = 0; i < layers.size(); ++i)
  {
    string istr = to_string(i);
    Symbol fc = FullyConnected(
      i == 0? x : outputs[i-1],  // data
      weights[i],
      biases[i],
      layers[i]);
    outputs[i] = i == layers.size()-1 ? fc :  Activation(string("act") + istr, fc, 
    ActivationActType::kTanh);
  }

  return outputs.back();
}

int main(int argc, char** argv)
{
    const int feature_size = 2;
    const vector<int> layers{8, 4, 1};
    const int batch_size = 4;
    const int max_epoch = 100000;
    const float learning_rate = 0.001;
    const float weight_decay = 1e-2;

    auto ctx = Context::cpu(); // Use GPU for training
    auto ctx_cpu = Context::cpu();

    vector<Symbol> weights(layers.size());
    vector<Symbol> biases(layers.size());

    for (size_t i = 0; i < layers.size(); ++i)
    {
        string istr = to_string(i);
        weights[i] = Symbol::Variable("w" + istr);
        biases[i] = Symbol::Variable("b" + istr);
    }

    auto Net = mlp(layers, weights, biases, "X");
    auto sym_label = Symbol::Variable("label");
    auto output = LogisticRegressionOutput(string("sigmoid"), Net, sym_label);

    map<string, NDArray> args_map;
    args_map["X"] = NDArray(Shape(batch_size, feature_size) , ctx);
    args_map["label"] = NDArray(Shape(batch_size, 1), ctx);

    auto *exec = output.SimpleBind(ctx, args_map);
    output.InferArgsMap(ctx, &args_map, args_map);
    auto arg_names = output.ListArguments();

    Xavier xavier = Xavier(Xavier::gaussian, Xavier::avg);
    for (auto &arg : args_map)
    {
        xavier(arg.first, &arg.second);
    }

    Optimizer* opt = OptimizerRegistry::Find("adam");
    opt->SetParam("rescale_grad", 1.0 / batch_size)
        ->SetParam("lr", learning_rate)
        ->SetParam("wd", weight_decay);

    // XOR Function
    mx_float* aptr_x = new mx_float[batch_size * feature_size];
    mx_float* aptr_y = new mx_float[batch_size];

    aptr_x[0] = 0.; aptr_x[1] = 0.; aptr_y[0] = 0;
    aptr_x[2] = 0; aptr_x[3] = 1.; aptr_y[1] = 1;
    aptr_x[4] = 1.; aptr_x[5] = 0.; aptr_y[2] = 1;
    aptr_x[6] = 1.; aptr_x[7] = 1.; aptr_y[3] = 0;

    NDArray train_data = NDArray(Shape(batch_size, 2), ctx_cpu, false);
    NDArray train_label = NDArray(Shape(batch_size), ctx_cpu, false);
    train_data.SyncCopyFromCPU(aptr_x, batch_size * 2);
    train_label.SyncCopyFromCPU(aptr_y, batch_size);
    train_data.WaitToRead();
    train_label.WaitToRead();

    Accuracy acu_train;
    for (int ITER = 0; ITER < max_epoch ; ++ITER)
    {
        acu_train.Reset();
        args_map["X"] = train_data.Copy(ctx);
        args_map["label"] = train_label.Copy(ctx);
        NDArray::WaitAll();

        exec->Forward(true);
        acu_train.Update(args_map["label"], exec->outputs[0]);

        if(ITER % 5000 == 0){
            auto out = (exec->outputs[0]).Copy(ctx_cpu);
            auto labels = args_map["label"].Copy(ctx_cpu);
            NDArray::WaitAll();
            const mx_float * outs = out.GetData();
            auto lbs = labels.GetData();
            for (int i = 0 ; i < batch_size ; i++)
                cout << lbs[i] << ":" << outs[i] << " ";
            cout << endl;
            LG << "ITER: " << ITER << " Train Accuracy: " << acu_train.Get();
        }
        exec->Backward();
        // Update parameters
        for (size_t i = 0; i < arg_names.size(); ++i)
        {
            if (arg_names[i] == "X" || arg_names[i] == "label") continue;
            opt->Update(i, exec->arg_arrays[i], exec->grad_arrays[i]);
        }
    }

    delete exec;
    delete [] aptr_x;
    delete [] aptr_y;
    MXNotifyShutdown();
    return 0;
 }

The output is (True_label : Predicted_label):

0:0.95178 1:0.880215 1:0.944654 0:0.86154 
[19:14:56] xor.cpp:114: ITER: 0 Train Accuracy: 0.5
0:0.786497 1:1 1:0.799124 0:3.35246e-13 
[19:14:57] xor.cpp:114: ITER: 5000 Train Accuracy: 0.5
0:0.786137 1:1 1:0.800972 0:1.01632e-21 
[19:14:58] xor.cpp:114: ITER: 10000 Train Accuracy: 0.5
0:0.783514 1:1 1:0.802832 0:3.29902e-30 
[19:14:58] xor.cpp:114: ITER: 15000 Train Accuracy: 0.5
0:0.785589 1:1 1:0.805458 0:1.13999e-38 
[19:14:59] xor.cpp:114: ITER: 20000 Train Accuracy: 0.5
0:0.785148 1:1 1:0.808582 0:0 
[19:15:00] xor.cpp:114: ITER: 25000 Train Accuracy: 0.5
0:0.786545 1:1 1:0.812904 0:0 
[19:15:00] xor.cpp:114: ITER: 30000 Train Accuracy: 0.5
0:0.784969 1:1 1:0.8189 0:0 
[19:15:01] xor.cpp:114: ITER: 35000 Train Accuracy: 0.5
0:0.784798 1:1 1:0.82841 0:0 
[19:15:02] xor.cpp:114: ITER: 40000 Train Accuracy: 0.5
0:0.787042 1:1 1:0.845301 0:0 
[19:15:02] xor.cpp:114: ITER: 45000 Train Accuracy: 0.5
0:0.784718 1:1 1:0.879533 0:0 
[19:15:03] xor.cpp:114: ITER: 50000 Train Accuracy: 0.5
0:0.783628 1:1 1:0.934087 0:0 
[19:15:04] xor.cpp:114: ITER: 55000 Train Accuracy: 0.5
0:0.786908 1:1 1:0.948499 0:0 
[19:15:04] xor.cpp:114: ITER: 60000 Train Accuracy: 0.5
0:0.784415 1:1 1:0.948458 0:0 
[19:15:05] xor.cpp:114: ITER: 65000 Train Accuracy: 0.5
0:0.784358 1:1 1:0.948456 0:0 
[19:15:06] xor.cpp:114: ITER: 70000 Train Accuracy: 0.5
0:0.784338 1:1 1:0.948455 0:0 
[19:15:07] xor.cpp:114: ITER: 75000 Train Accuracy: 0.5
0:0.797159 1:1 1:0.948738 0:0 
[19:15:07] xor.cpp:114: ITER: 80000 Train Accuracy: 0.5
0:0.784321 1:1 1:0.948455 0:0 
[19:15:08] xor.cpp:114: ITER: 85000 Train Accuracy: 0.5
0:0.7837 1:1 1:0.948445 0:0 
[19:15:09] xor.cpp:114: ITER: 90000 Train Accuracy: 0.5
0:0.78427 1:1 1:0.948449 0:0 
[19:15:09] xor.cpp:114: ITER: 95000 Train Accuracy: 0.5

Even after 10000 iterations, the neural net is not able to predict the XOR function correctly. I have tried different layer sizes (both increasing/decreasing the number of neurons) but it didn't help at all. Is this a bug in the C++ API or is there any problem with my code. @piiswrong @szha

f-matt commented 6 years ago

I'm just starting to study MXNet, but this example is working here (CPU only).

#include "mxnet-cpp/MxNetCpp.h"

#include <chrono>
#include <iostream>

using namespace std;
using namespace mxnet::cpp;

Symbol mlp(const vector<int> &layers) {
    auto x = Symbol::Variable("X");
    auto label = Symbol::Variable("label");

    vector<Symbol> weights(layers.size());
    vector<Symbol> biases(layers.size());
    vector<Symbol> outputs(layers.size());

    for (size_t i = 0; i < layers.size(); ++i) {
        weights[i] = Symbol::Variable("w" + to_string(i));
        biases[i] = Symbol::Variable("b" + to_string(i));
        Symbol fc = FullyConnected(
                i == 0 ? x : outputs[i-1],
                        weights[i],
                        biases[i],
                        layers[i]);

        outputs[i] = i == layers.size() - 1 ? fc :
                Activation(fc, ActivationActType::kRelu);
    }

    return SoftmaxOutput(outputs.back(), label);

}

int main(int argc, char *argv[]) {

    const int input_width = 2;
    const int input_height = 1;
    const int output_length = 2;
    const size_t train_num = 8000;
    const size_t val_num = 2000;
    const vector<int> layers{8, 8, 2};
    const int batch_size = 64;
    const int max_epoch = 10;
    const float learning_rate = 0.01;
    const float weight_decay = 1e-2;

    vector<float> data;
    vector<float> labels;

    for (int i = 0; i < 10000; ++i) {

        int x1 = rand() % 2;
        int x2 = rand() % 2;

        int y = x1 ^ x2;

        data.push_back(x1);
        data.push_back(x2);

        labels.push_back(y == 0 ? 1 : 0);
        labels.push_back(y == 0 ? 0 : 1);

    }

    const float *dptr = data.data();
    const float *lptr = labels.data();

    Context ctx = Context::cpu();

    NDArray data_array = NDArray(Shape(10000, 1, input_width, input_height), ctx);
    NDArray label_array = NDArray(Shape(10000, output_length), ctx);

    data_array.SyncCopyFromCPU(dptr, 10000 * input_width * input_height);
    label_array.SyncCopyFromCPU(lptr, 10000 * output_length);
    data_array.WaitToRead();
    label_array.WaitToRead();

    NDArray train_data = data_array.Slice(0, 8000);
    NDArray train_label = label_array.Slice(0, 8000);

    NDArray val_data = data_array.Slice(8000, 10000);
    NDArray val_label = label_array.Slice(8000, 10000);

    auto net = mlp(layers);

    std::map<string, NDArray> args;

    args["X"] = NDArray(Shape(batch_size, 1, input_width, input_height), ctx);
    args["label"] = NDArray(Shape(batch_size, output_length), ctx);

    net.InferArgsMap(ctx, &args, args);

    auto initializer = Uniform(0.001);
    for (auto &arg : args) {
        initializer(arg.first, &arg.second);
    }

    Optimizer* opt = OptimizerRegistry::Find("adam");
    opt->SetParam("lr", learning_rate)
            ->SetParam("wd", weight_decay);

    auto *exec = net.SimpleBind(ctx, args);
    auto arg_names = net.ListArguments();

    for (int iter = 0; iter < max_epoch; ++iter) {

        int samples = 0;

        auto tic = chrono::system_clock::now();

        size_t start_index = 0;
        size_t correct_count = 0;
        size_t all_count = 0;

        while (start_index < train_num) {

            if (start_index + batch_size > train_num) {
                start_index = train_num - batch_size;
            }

            train_data.Slice(start_index, start_index + batch_size).CopyTo(&args["X"]);
            train_label.Slice(start_index, start_index + batch_size).CopyTo(&args["label"]);

            NDArray::WaitAll();

            exec->Forward(true);
            exec->Backward();

            for (size_t i = 0; i < arg_names.size(); ++i) {
                if (arg_names[i] == "X"  || arg_names[i] == "label")
                    continue;

                opt->Update(i, exec->arg_arrays[i], exec->grad_arrays[i]);
            }

            NDArray y(Shape(batch_size, output_length), ctx);
            NDArray y_pred(Shape(batch_size, output_length), ctx);

            train_label.Slice(start_index, start_index + batch_size).CopyTo(&y);
            exec->outputs[0].CopyTo(&y_pred);

            NDArray::WaitAll();

            const mx_float *ptr_y = y.GetData();
            const mx_float *ptr_y_pred = y_pred.GetData();

            for(int i = 0; i < batch_size; ++i) {

                all_count++;

                int correct = ptr_y[2*i] < ptr_y[2*i+1] ? 1 : 0;
                int predicted = ptr_y_pred[2*i] < ptr_y_pred[2*i+1] ? 1 : 0;

                if (correct == predicted)
                    correct_count++;

            }

            start_index += batch_size;
            samples += batch_size;

        }

        auto toc = chrono::system_clock::now();
        float duration = chrono::duration_cast<chrono::milliseconds> (toc - tic).count() / 1000.0;
        float train_acc = (float) correct_count / all_count;

        correct_count = 0;
        all_count = 0;

        start_index = 0;

        while (start_index < val_num) {

            if (start_index + batch_size > val_num) {
                start_index = val_num - batch_size;
            }

            val_data.Slice(start_index, start_index + batch_size).CopyTo(&args["X"]);
            val_label.Slice(start_index, start_index + batch_size).CopyTo(&args["label"]);

            start_index += batch_size;

            NDArray::WaitAll();

            exec->Forward(false);

            const auto &out = exec->outputs;

            NDArray label_cpu(Shape(batch_size, output_length), ctx);
            NDArray out_cpu(Shape(batch_size, output_length), ctx);

            out[0].CopyTo(&out_cpu);
            val_label.Slice(start_index - batch_size, start_index).CopyTo(&label_cpu);

            NDArray::WaitAll();

            const mx_float *dptr_out = out_cpu.GetData();
            const mx_float *dptr_label = label_cpu.GetData();

            for(int i = 0; i < batch_size; ++i) {
                all_count++;

                int correct = dptr_label[2*i] < dptr_label[2*i+1] ? 1 : 0;
                int predicted = dptr_out[2*i] < dptr_out[2*i+1] ? 1 : 0;

                if (correct == predicted)
                    correct_count++;

            }

        }

        LG << "Epoch: " << iter << " " << samples/duration << " samples/sec Train-Accuracy: " << train_acc << " Val-Accuracy: " << (float) correct_count / all_count;

    }

    delete exec;
    MXNotifyShutdown();

    return 0;

}

chsin commented 6 years ago

I'm having the same problem, but for the lenet.cpp example using just the CPU (example in https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/lenet.cpp). I built v1.0.0 Mxnet with MKL, but that shouldn't be the issue since you have the same problem. Does lenet.cpp fail to learn for you too?

paoloviviani commented 6 years ago

I'm also experiencing the same problem here, but with LinearRegressionOutput (even with a single neuron and no activation). I see that gradients are extremely small and weight are not moving so much. Any idea?

paoloviviani commented 6 years ago

I'm implementing a multi layer perceptron for naïve regression. I succeeded in training by creating DataBatch objects from NDArray and using them to train the model:

    vector<DataBatch> data_loader;
    size_t samples = 0;
    while (samples < data_count) {
        DataBatch mini_batch;
        mini_batch.data = data_array.Slice(samples, samples + batch_size).Copy(ctx);
        mini_batch.label = label_array.Slice(samples, samples + batch_size).Copy(ctx);
        if (data_count-samples < batch_size)
            mini_batch.pad_num = padding;
        else
            mini_batch.pad_num = 0;
        data_loader.push_back(mini_batch);
        samples += batch_size;
    }

then this can be used to train as following:

for (auto mb : data_loader) {

            mb.data.CopyTo(&args["X"]);
            mb.label.CopyTo(&args["label"]);

            exec->Forward(true);
            exec->Backward();

            for (size_t i = 0; i < arg_names.size(); ++i) {
                if (arg_names[i] == "X" || arg_names[i] == "label") continue;
                opt->Update(i, exec->arg_arrays[i], exec->grad_arrays[i]);
            }
        }

ddavydenko commented 5 years ago

@mxnet-label-bot [Training]

ddavydenko commented 5 years ago

@theSparta , it seems you are using C++ API for training the model. Would you mind sharing your use case? Namely, what made you chose C++ API for training purposes over, let's say, Python API?

agarwl commented 5 years ago

It was a long time ago but I was trying to use the C++ API because the trained network was supposed to be interfaced with some C++ library after training to collect more training data (similar to DAgger). The final solution I ended up with was : (1) train the model in python, (2) export the trained model to C++ to run inference (3) collect more data using trained network (4) Go back to step (1)

leleamol commented 5 years ago

@theSparta ,

There were recent fixes to cpp-package. Can you please try again to see if the program works as you have expected?

leleamol commented 5 years ago

@mxnet-label-bot add [Pending Requester Info]

agarwl commented 5 years ago

@leleamol I don't have access to my machine anymore and I stopped using MXNet a long time ago.

vdantu commented 5 years ago

@theSparta : I know you have already solved this and are not facing this issue, but could you point to the python training code that you had used to train the same model?

Also, can we close this issue now? We can reopen it if you face this issue.

agarwl commented 5 years ago

@vdantu I switched to Tensorflow instead to solve my problem instead of using MXNet.

szha commented 5 years ago

@theSparta sorry to hear that. We still owe this issue a proper closure so I'm reopening it. @vdantu did you try to verify the code above and see if it's working? If not, why would you want to close an unsolved issue?

leleamol commented 5 years ago

I looked at the example given here https://github.com/apache/incubator-mxnet/issues/8126#issuecomment-333847990 It seems the example is training the xor gate with only 4 data points. This might be insufficient data to train the neural network. The example given by @f-matt has similar architecture but runs the training with 8000 data samples.

agarwl commented 5 years ago

There are only 4 unique data points to train a XOR gate. Are you saying that the possible reason could have been that the network was not trained for enough epochs? Other than that, changing the batch size doesn't have any other effect in this case.

apache / mxnet