autonomio / talos

Hyperparameter Experiments with TensorFlow and Keras
https://autonom.io
MIT License
1.62k stars 268 forks source link

Need Support for Data Shapes #386

Closed denizkenankilic closed 5 years ago

denizkenankilic commented 5 years ago

Hi again, I hope everything is well.

Actually I am new to Talos and as I stated before I am trying to model financial data by using LSTM model. My talos version is 0.5.0

I have 2 columns data with 9830 observations (c1 and c2, daily closing prices data for DAX and NASDAQ respectively) and I am trying to use LSTM model. I want to use both DAX and NASDAQ data in order to forecast DAX data. I want to use Talos to get best hyperparameters.

I divided dataset into training and test as (6860,2) and (2970,2) respectively. Afterwards, I prepared trainX (6850,10,2), trainY (6850,) and testX (2960,10,2), testY (2960,) datasets where time step is 10 in order to create datasets for LSTM type. trainX includes both DAX and NASDAQ, trainY includes NASDAQ (same for test data). I normalized each data between 0 and 1. Size of each data: trainX --> (6850, 10 ,2) trainY --> (6850, ) testX -->(2960, 10 ,2) testY --> (2960, )

After I have defined parameters and created a sequential model, I used Scan function. However for each round accuracy and val_acc are almost zero, they are not raising. I am sharing my codes and results. Is it about data shapes or data parts? If is it the case could you please help me about how to use data shapes to get a right accuracy results?

def build_timeseries(mat, y_col_index): """ Converts ndarray into timeseries format and supervised data format. Takes first TIME_STEPS number of rows as input and sets the TIME_STEPS+1th data as corresponding output and so on. :param mat: ndarray which holds the dataset :param y_col_index: index of column which acts as output (here close prices) :return: returns two ndarrays-- input and output in format suitable to feed to LSTM. """

total number of time-series samples would be len(mat) - TIME_STEPS

dim_0 = mat.shape[0] - TIME_STEPS
dim_1 = mat.shape[1]
x = np.zeros((dim_0, TIME_STEPS, dim_1))
y = np.zeros((dim_0,))
print("dim_0",dim_0)
for i in tqdm_notebook(range(dim_0)):
    x[i] = mat[i:TIME_STEPS+i]
    y[i] = mat[TIME_STEPS+i, y_col_index]
print("length of time-series i/o",x.shape,y.shape)
return x, y


round_epochs | val_loss | val_acc | loss | acc | first_neuron...
-- | -- | -- | -- | -- | --
30 | 0.003848529 | 0.000337838 | 0.001812422 | 0.000145985 | 4...
20 | 8.42E-05 | 0.000337838 | 2.08E-05 | 0.000145985 | 4...
20 | 0.072720715 | 0.000337838 | 0.032609288 | 0.000145985 | 4...
20 | 0.569160606 | 0 | 0.189147563 | 0.000145985 | 4...
20 | 0.00072527 | 0.000337838 | 1.73E-05 | 0.000145985 | 4...
30 | 0.431146202 | 0 | 0.134227209 | 0.000145985 | 4...
20 | 0.569160608 | 0 | 0.189147563 | 0.000145985 | 4...
10 | 0.099647979 | 0.000337838 | 0.025362976 | 0.000145985 | 4...
20 | 0.098018228 | 0.000337838 | 0.019166352 | 0.000145985 | 4...

Thank yo so much.
beeb commented 5 years ago

Accuracy is only used for classification tasks. See here

mikkokotila commented 5 years ago

@denizkenankilic things are very well thanks. Whenever in doubt, it's a good practice to run your model as a stand-alone Keras model first. Oftentimes it ends up that the issue is with the input model.

Closing here. Feel free to open new issue if anything.

denizkenankilic commented 5 years ago

Accuracy is only used for classification tasks. See here

Thanks for your response, so which metrics do I need to use for training part? In many lstm modeling examples (not classification examples), people are using accuracy. Should I look only loss values?

denizkenankilic commented 5 years ago

@denizkenankilic things are very well thanks. Whenever in doubt, it's a good practice to run your model as a stand-alone Keras model first. Oftentimes it ends up that the issue is with the input model.

Closing here. Feel free to open new issue if anything.

I tried stand-alone Keras model, it gives good accuracy results for training part. Actually I am not sure that in Scan command, which usage is correct (xx and yy were created in the same way as others, but they are the whole data set): ta.Scan(x=trainX, y=trainY, x_val=testX, y_val=testY,... or ta.Scan(x=xx, y=yy, x_val=testX, y_val=testY, where trainX --> (6850, 10 ,2) trainY --> (6850, ) testX -->(2960, 10 ,2) testY --> (2960, ) xx --> (9820, 10 ,2) yy --> (9820,)

@mikkokotila Thanks.

mikkokotila commented 5 years ago

Preferred way is ta.Scan(x=trainX, y=trainY, x_val=testX, y_val=testY) because it supports all use-cases, where as the other does not.