Hello, I wanted to ask a couple of questions. I retrieve my data from the database. And I did not find in the examples the correct way to convert Java arrays to DataSetIterators. Below is an example of a code with toy data (my data is also an integer format, so the example is correct).
1) How do I normalize my data? Below in the code there is a commented-out code, it converts only the input data, but the labels remain not converted.
2) why can't I evaluate test data without an iterator?
3) How can I set the batch size if I do not use a DataSeIterator?
4) Did I set the array shape for a recurrent neural network correctly? (split_sequence() method)
(Initially, I thought it was necessary to bring to the form [samples,time steps, features] as in Keras)
5) why am I getting an error while trying to predict the value?
6) I added classifier for nd4j but but the warning messages did not disappear
public class Forecast {
public final static int BATCH_SIZE = 2; // how many examples to simultaneously train in the network
public final static int RNG_SEED = 123; // fix random seed for reproducibility
public final static int TIME_STEPS = 3;
public final static int nEpoch = 10;
public static void main(String[] args) {
int[] train = {10, 20, 30, 40, 50, 60, 70, 80, 90};
int[] test = {70, 80, 90};
INDArray pred = Nd4j.create(copyFromIntArray(test)).reshape(new int[]{1,test.length, 1});
DataSet dataSet = split_sequence(train, TIME_STEPS);
DataSetIterator train_iterator = new ListDataSetIterator(Collections.singletonList(dataSet), BATCH_SIZE);
// NormalizerMinMaxScaler normalizer = new NormalizerMinMaxScaler(0, 1);
// normalizer.fit(dataSet);
// normalizer.transform(dataSet);
// network configuration (not yet initialized)
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(RNG_SEED)
.weightInit(WeightInit.XAVIER)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.updater(new Adam())
.l2(1e-4)
.list()
.layer(0,new LSTM.Builder()
.nIn(3)
.nOut(20)
.activation(Activation.SOFTSIGN)
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
.gradientNormalizationThreshold(10)
.build())
.layer(1, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MSE)
.activation(Activation.IDENTITY)
.nIn(20)
.nOut(1)
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
.gradientNormalizationThreshold(10)
.build())
.build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
System.out.println("Train model...");
long startTime = System.currentTimeMillis();
model.setListeners(new ScoreIterationListener(1));
for (int i=0; i < nEpoch; i++)
model.fit(dataSet);
long endTime = System.currentTimeMillis();
System.out.println("=============run time===================== " + (endTime - startTime));
// var eval = model.evaluateRegression(test_iterator); // how i can evaluate without DataSetIterator?
var predict = model.predict(pred);
System.out.println(Arrays.toString(predict));
}
//split a given univariate sequence into multiple samples where each sample has a specified number of time steps and the output is a single time step
public static DataSet split_sequence(int[] sequence, int n_steps) {
int arr_size = sequence.length - n_steps;
var input = new double[arr_size][n_steps];
var label = new double[arr_size];
for (int i = 0; i < sequence.length; i++){
int seq_end = i + n_steps;
if (seq_end > sequence.length-1) break;
input[i] = copyFromIntArray(Arrays.copyOfRange(sequence, i, seq_end));
label[i] = sequence[seq_end];
}
INDArray x = Nd4j.create(input).reshape(new int[]{arr_size, n_steps,1});
INDArray y = Nd4j.create(label).reshape(new int[]{arr_size, 1,1});
return new DataSet(x,y);
}
public static double[] copyFromIntArray(int[] source) {
double[] dest = new double[source.length];
for(int i=0; i<source.length; i++) {
dest[i] = source[i];
}
return dest;
}
}
Results
[main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [CpuBackend] backend
[main] INFO org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for linear algebra: 1
[main] WARN org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory - *********************************** CPU Feature Check Warning ***********************************
[main] WARN org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory - Warning: Initializing ND4J with Generic x86 binary on a CPU with AVX/AVX2 support
[main] WARN org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory - Using ND4J with AVX/AVX2 will improve performance. See deeplearning4j.org/cpu for more details
[main] WARN org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory - Or set environment variable ND4J_IGNORE_AVX=true to suppress this warning
[main] WARN org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory - *************************************************************************************************
[main] INFO org.nd4j.nativeblas.Nd4jBlas - Number of threads used for OpenMP BLAS: 4
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Backend used: [CPU]; OS: [Mac OS X]
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Cores: [8]; Memory: [4,0GB];
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [OPENBLAS]
[[[70.0000],
[80.0000],
[90.0000]]]
[main] INFO org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
Train model...
[main] INFO org.deeplearning4j.optimize.listeners.ScoreIterationListener - Score at iteration 0 is 28134.527770565328
[main] INFO org.deeplearning4j.optimize.listeners.ScoreIterationListener - Score at iteration 1 is 28117.756260491005
[main] INFO org.deeplearning4j.optimize.listeners.ScoreIterationListener - Score at iteration 2 is 28100.43006310864
[main] INFO org.deeplearning4j.optimize.listeners.ScoreIterationListener - Score at iteration 3 is 28082.492537826598
[main] INFO org.deeplearning4j.optimize.listeners.ScoreIterationListener - Score at iteration 4 is 28063.92220023444
[main] INFO org.deeplearning4j.optimize.listeners.ScoreIterationListener - Score at iteration 5 is 28044.66826909151
[main] INFO org.deeplearning4j.optimize.listeners.ScoreIterationListener - Score at iteration 6 is 28024.71316627492
[main] INFO org.deeplearning4j.optimize.listeners.ScoreIterationListener - Score at iteration 7 is 28004.043219907227
[main] INFO org.deeplearning4j.optimize.listeners.ScoreIterationListener - Score at iteration 8 is 27982.656476846336
[main] INFO org.deeplearning4j.optimize.listeners.ScoreIterationListener - Score at iteration 9 is 27960.558796485842
=============run time===================== 566
Exception in thread "main" java.lang.IllegalArgumentException: getRow() can be called on 2D arrays only
at org.nd4j.base.Preconditions.throwEx(Preconditions.java:636)
at org.nd4j.base.Preconditions.checkArgument(Preconditions.java:64)
at org.nd4j.linalg.api.ndarray.BaseNDArray.getRow(BaseNDArray.java:4293)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.predict(MultiLayerNetwork.java:2226)
at ml.Forecast.main(Forecast.java:90)
Question
Hello, I wanted to ask a couple of questions. I retrieve my data from the database. And I did not find in the examples the correct way to convert Java arrays to DataSetIterators. Below is an example of a code with toy data (my data is also an integer format, so the example is correct). 1) How do I normalize my data? Below in the code there is a commented-out code, it converts only the input data, but the labels remain not converted. 2) why can't I evaluate test data without an iterator? 3) How can I set the batch size if I do not use a DataSeIterator? 4) Did I set the array shape for a recurrent neural network correctly? (split_sequence() method) (Initially, I thought it was necessary to bring to the form [samples,time steps, features] as in Keras) 5) why am I getting an error while trying to predict the value? 6) I added classifier for nd4j but but the warning messages did not disappear
Results
Version Information