bquast / rnn

Recurrent Neural Networks in R
https://qua.st/rnn
73 stars 28 forks source link

QUESTION (or Issue in documentation): X and Y arrays for trainr function #29

Open chaitanyabjoshi opened 6 years ago

chaitanyabjoshi commented 6 years ago

The documentation for functoin trainr of the package says

Y array of output values, dim 1: samples (must be equal to dim 1 of X), dim 2: time (must be equal to dim 2 of X), dim 3: variables (could be 1 or more, if a matrix, will be coerce to array)
X array of input values, dim 1: samples, dim 2: time, dim 3: variables (could be 1 or more, if a matrix, will be coerce to array)

What is exactly samples, time variables in dimensions? Is it a consideration for time series? How can I use my existing time series data for prediction?

franciszmy commented 6 years ago

same question here. It's really confused to use, could you give more examples of using this package

faltinl commented 6 years ago

Here is an example which unfortunately does NOT work: calling

model <- trainr(Y=Y(a:dim1, 1), X=(train:dim1, 1, 2:dim3) ... seq_to_seq_unsync=T, ...)

(i.e. dim2=1 for both X and Y as required, and 0<train<<dim1=1000, dim3=20) produces the error message

Error in store[, 1:dim(X)[2], ] = X : incorrect number of subscripts

which is completely unexplainable for me.

Addendum: As can be seen, I have a data set of 1000 samples with 20 elements each in X and I want to train a 20-to-1 network on (1000-train+1) elements in order to classify the sets into 3 classes, defined by target values in Y.

starmessage commented 6 years ago

Same question here. I have a dataset in a dataframe with historical stock data. The dataframe columns are: openPercent, highPercent, lowPercent, closePercent, volumeNormalized, buySignal, sellSignal. The last two columns are to be used as outputs. All my attempts to feed the trainr function failed. Please provide more information on the expected format for X and Y.

faltinl commented 6 years ago

Hi- in the meantime I have changed from rnn-package, which is not supported, to KerasR. It's an awfully tedious procedure and specifying tensor dimensions within a network etc. is not at all straightforward, but finally opens up much more flexibility and programming options, so I really recommend it. Perhaps the single most important detail is NOT to install KerasR from CRAN but from github development tools, see https://keras.rstudio.com/ for more details. This will get you the most recent version. The official version from CRAN I happened to install in the first place did not contain the most recent versions of one of the sub-packages, leading to mysterious error messages...

ulsmanikanta commented 6 years ago

Hi Fatini

Are you able to work with your dataset of 1000 samples having 20 to 1 network using RNN package?

faltinl commented 6 years ago

No, sorry. I don't use this packege any more, as I explained in my post above.

DimitriF commented 6 years ago

To compare with the documentation of keras (which I am now also using): " Input shapes

3D tensor with shape (batch_size, timesteps, input_dim), (Optional) 2D tensors with shape (batch_size, output_dim). "

Similarly for the rnn package, you cannot train the model with formula approach, i.e. x and y must be supplied and their dimension must make sens: (sample, time steps, variable).

What is important to understand is that the network will see a 3D shape and not a 2D as for classical modeling in R so you must think in 3D. Being comfortable with the dimension in your dataset and how they make sens for what you want your neural network to do is mandatory to train it. As @faltinl mentioned, it is the same in keras where you need to specify the tensor dimension, in the rnn package, we tried to infer it from the inputs and put warning when mismatch are found. It is still not perfect though and more documentation could help.

In case of @faltinl example with dim2=1, it will means there is only one time step which is not what you want to do if you used rnn. the error is not catch, thus the useless error message.

In case of @starmessage dataset, I believe you have only one observation with 7 variable in input and 2 in output. If I assume you have 1000 row in such dataframe, the X dimension will be c(1,1000,7) and Y dimension c(1,1000,2). The function array, aperm and dim are very useful for re-dimensioning. Not entirely sure we tried it though and R drop dimension of value 1 when subsetted if drop=F is not set...