ageron / handson-ml

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
Apache License 2.0
25.14k stars 12.91k forks source link

Tensorflow--nothing works, runtime error on everything. #404

Closed zoakes closed 5 years ago

zoakes commented 5 years ago

I'm following the GitHub, copying and pasting, and NOTHING works in it. I get a runtime error on everything besides the import, and reset_graph().

ageron commented 5 years ago

Hi @zoakes ,

I'm sorry you are experiencing problems, I'll try to help you. First, could you please provide more details about your installation?

Also, could you please run the following code and copy/paste the output (especially the full stacktrace in case of error)?

import sys
import numpy as np
import tensorflow as tf

print("TF:", tf.__version__)
print("Python:", sys.version_info)
print("NumPy:", np.__version__)

with tf.Session() as sess:
    print("The Answer:", sess.run(tf.constant(42)))

print("GPU:", tf.test.is_gpu_available())
ageron commented 5 years ago

For example, here is the full output on my MacBook:

>>> import sys
>>> import numpy as np
>>> import tensorflow as tf
>>>
>>> print("TF:", tf.__version__)
TF: 1.13.1
>>> print("Python:", sys.version_info)
Python: sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=0)
>>> print("NumPy:", np.__version__)
NumPy: 1.16.2
>>>
>>> with tf.Session() as sess:
...     print("The Answer:", sess.run(tf.constant(42)))
...
2019-04-13 18:24:04.448151: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
The Answer: 42
>>> print("GPU:", tf.test.is_gpu_available())
GPU: False
zoakes commented 5 years ago

Sure. I tried a lot of downloads, it just seems they were incomplete. Initially I had 1.13, when the errors were first happening. Then I installed GPU, no changes. I decided to uninstall entirely and upgrade to 2.0, thinking maybe it would help. I used the instructions from tensorflows site... it was something a bit wordy, pip install —upgrade tensorflow=2.0 alpha 0.0 something like that? Also tried the inline foo.py line, but got an error.

If there’s a command to check the line input, I’ll look.

It seemed to simply change the functions that wouldn’t work, for example now with 2.0 Session() doesn’t work, and reset_defaulf_graph(). So... their pretty imperative functions.

But the basic mnist neural network on TF’s site worked fine, so obviously some of it installed, all of Keras seems fine.

Zach

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Aurélien Geron notifications@github.com Sent: Saturday, April 13, 2019 5:25 AM To: ageron/handson-ml Cc: zoakes; Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

For example, here is the full output on my MacBook:

import sys import numpy as np import tensorflow as tf

print("TF:", tf.version) TF: 1.13.1 print("Python:", sys.version_info) Python: sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=0) print("NumPy:", np.version) NumPy: 1.16.2

with tf.Session() as sess: ... print("The Answer:", sess.run(tf.constant(42))) ... 2019-04-13 18:24:04.448151: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA The Answer: 42 print("GPU:", tf.test.is_gpu_available()) GPU: False

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fissues%2F404%23issuecomment-482796907&data=02%7C01%7C%7C485a35fdae0d40806fee08d6bffa59e3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636907479294890109&sdata=B8HZGkPAPblwp51rwYwAVwrWDij%2BgoaBccsLsEj5lro%3D&reserved=0, or mute the threadhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAt9BILJoV2yYHzEdJXFMMNl63144YZ4qks5vgbCYgaJpZM4ctyXw&data=02%7C01%7C%7C485a35fdae0d40806fee08d6bffa59e3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636907479294900115&sdata=v4aucqeU2y7aoiOAxOCAuTZSto0y1dGS2bzIBGFogDQ%3D&reserved=0.

zoakes commented 5 years ago

One second I’ll run the commands

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Zach Oakes zach_oakes@outlook.com Sent: Saturday, April 13, 2019 7:24 AM To: ageron/handson-ml; ageron/handson-ml Cc: Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

Sure. I tried a lot of downloads, it just seems they were incomplete. Initially I had 1.13, when the errors were first happening. Then I installed GPU, no changes. I decided to uninstall entirely and upgrade to 2.0, thinking maybe it would help. I used the instructions from tensorflows site... it was something a bit wordy, pip install —upgrade tensorflow=2.0 alpha 0.0 something like that? Also tried the inline foo.py line, but got an error.

If there’s a command to check the line input, I’ll look.

It seemed to simply change the functions that wouldn’t work, for example now with 2.0 Session() doesn’t work, and reset_defaulf_graph(). So... their pretty imperative functions.

But the basic mnist neural network on TF’s site worked fine, so obviously some of it installed, all of Keras seems fine.

Zach

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Aurélien Geron notifications@github.com Sent: Saturday, April 13, 2019 5:25 AM To: ageron/handson-ml Cc: zoakes; Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

For example, here is the full output on my MacBook:

import sys import numpy as np import tensorflow as tf

print("TF:", tf.version) TF: 1.13.1 print("Python:", sys.version_info) Python: sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=0) print("NumPy:", np.version) NumPy: 1.16.2

with tf.Session() as sess: ... print("The Answer:", sess.run(tf.constant(42))) ... 2019-04-13 18:24:04.448151: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA The Answer: 42 print("GPU:", tf.test.is_gpu_available()) GPU: False

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fissues%2F404%23issuecomment-482796907&data=02%7C01%7C%7C485a35fdae0d40806fee08d6bffa59e3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636907479294890109&sdata=B8HZGkPAPblwp51rwYwAVwrWDij%2BgoaBccsLsEj5lro%3D&reserved=0, or mute the threadhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAt9BILJoV2yYHzEdJXFMMNl63144YZ4qks5vgbCYgaJpZM4ctyXw&data=02%7C01%7C%7C485a35fdae0d40806fee08d6bffa59e3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636907479294900115&sdata=v4aucqeU2y7aoiOAxOCAuTZSto0y1dGS2bzIBGFogDQ%3D&reserved=0.

zoakes commented 5 years ago

[Image] Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Zach Oakes zach_oakes@outlook.com Sent: Saturday, April 13, 2019 7:24:22 AM To: ageron/handson-ml; ageron/handson-ml Cc: Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

One second I’ll run the commands

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Zach Oakes zach_oakes@outlook.com Sent: Saturday, April 13, 2019 7:24 AM To: ageron/handson-ml; ageron/handson-ml Cc: Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

Sure. I tried a lot of downloads, it just seems they were incomplete. Initially I had 1.13, when the errors were first happening. Then I installed GPU, no changes. I decided to uninstall entirely and upgrade to 2.0, thinking maybe it would help. I used the instructions from tensorflows site... it was something a bit wordy, pip install —upgrade tensorflow=2.0 alpha 0.0 something like that? Also tried the inline foo.py line, but got an error.

If there’s a command to check the line input, I’ll look.

It seemed to simply change the functions that wouldn’t work, for example now with 2.0 Session() doesn’t work, and reset_defaulf_graph(). So... their pretty imperative functions.

But the basic mnist neural network on TF’s site worked fine, so obviously some of it installed, all of Keras seems fine.

Zach

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Aurélien Geron notifications@github.com Sent: Saturday, April 13, 2019 5:25 AM To: ageron/handson-ml Cc: zoakes; Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

For example, here is the full output on my MacBook:

import sys import numpy as np import tensorflow as tf

print("TF:", tf.version) TF: 1.13.1 print("Python:", sys.version_info) Python: sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=0) print("NumPy:", np.version) NumPy: 1.16.2

with tf.Session() as sess: ... print("The Answer:", sess.run(tf.constant(42))) ... 2019-04-13 18:24:04.448151: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA The Answer: 42 print("GPU:", tf.test.is_gpu_available()) GPU: False

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fissues%2F404%23issuecomment-482796907&data=02%7C01%7C%7C485a35fdae0d40806fee08d6bffa59e3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636907479294890109&sdata=B8HZGkPAPblwp51rwYwAVwrWDij%2BgoaBccsLsEj5lro%3D&reserved=0, or mute the threadhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAt9BILJoV2yYHzEdJXFMMNl63144YZ4qks5vgbCYgaJpZM4ctyXw&data=02%7C01%7C%7C485a35fdae0d40806fee08d6bffa59e3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636907479294900115&sdata=v4aucqeU2y7aoiOAxOCAuTZSto0y1dGS2bzIBGFogDQ%3D&reserved=0.

ageron commented 5 years ago

Hi @zoakes , Unfortunately, github filters out images sent by email. Please go to https://github.com/ageron/handson-ml/issues/404 and drag & drop the image on the comment you want to add.

zoakes commented 5 years ago
Screen Shot 2019-04-13 at 8 11 00 AM
zoakes commented 5 years ago

Realized I made a slight error in the print statement--fixed, but same attribute error. It just seems like the download was incomplete--however it doesn't appear to be incomplete when checking versions etc. Quite frustrating.

ageron commented 5 years ago

Thanks. Indeed, you have TensorFlow version 2.0-alpha0. Pretty much all Keras code will work the same way in TF 1.13.1 and TF 2.0-alpha0, but most of the rest will not work without significant changes. So you have two options: either you want to learn TF 2.0, and you need the 2nd edition of my book (it is available online in early release, I'm adding chapters regularly), or you can stick to TF 1.13.1 for now and use the 1st edition of my book. It's up to you.

pip3 uninstall tensorflow  # this will uninstall tensorflow 2.0 alpha0
pip3 install --user -U tensorflow  # this will install tensorflow 1.13.1

You may need administrator rights to run the first command, if you installed tensorflow using administrator rights. After running this, try running the same code again. The print("TF:", tf.__version__) line should print "TF: 1.13.1". Hope this helps

zoakes commented 5 years ago

Okay--Initially I did have 1.13, but still was having some issues. I'm happy to move on to 2.0, it does seem easier and I'm definitely not experienced with TF/ML so seems a better fit.
I tried buying your newer version, it wasn't available until like late 2019, could only pre-order--so went with this. I'll check out early releases. Is there any way to partially exchange version1 for the new version when it is available?

Thanks for your timely help, I've really enjoyed your book!

ageron commented 5 years ago

Cool, thanks for your kind words! I'm not sure it's possible to exchange edition 1 for edition 2, but perhaps O'Reilly offers that kind of service. I hope you'll enjoy the 2nd edition, I'm putting a lot of work into it right now. :)

zoakes commented 5 years ago

Yeah, they really should do something like that—even if it’s for a small credit towards the newer version.

I tend to take advantage of used books—just bc I buy so damn many—and O’Reilly never has any used books so figured maybe they did. Their books are great reference, so likely another factor in low inventory.

Loved the workflow of Ch 10’s code so far. Also love the new book design, nice change from the red border with animal theme.

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Aurélien Geron notifications@github.com Sent: Saturday, April 13, 2019 9:07 AM To: ageron/handson-ml Cc: Zach Mazz; Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

Cool, thanks for your kind words! I'm not sure it's possible to exchange edition 1 for edition 2, but perhaps O'Reilly offers that kind of service. I hope you'll enjoy the 2nd edition, I'm putting a lot of work into it right now. :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fissues%2F404%23issuecomment-482812214&data=02%7C01%7C%7C63c1fe4bad0d4b3c775208d6c0196537%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636907612629855183&sdata=kWEl01MrGzQ79VP8kYHyxTh9MTImpPxONrgVBdZlcWQ%3D&reserved=0, or mute the threadhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAt9BIL0JL_Q2GVTiQjGnDNKGqLD2Ytnyks5vgeStgaJpZM4ctyXw&data=02%7C01%7C%7C63c1fe4bad0d4b3c775208d6c0196537%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636907612629865191&sdata=JivOlci7RZgKbf41gHMNcCFKbXWecakrWO8A70El9hM%3D&reserved=0.

zoakes commented 5 years ago

Sorry, one unrelated question—does TF 2.0 have the Tensorboard features ? I was thinking that visualizing the process would likely be helpful for grasping neural networks.

Zach

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Zach Oakes zach_oakes@outlook.com Sent: Saturday, April 13, 2019 9:21 AM To: ageron/handson-ml; ageron/handson-ml Cc: Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

Yeah, they really should do something like that—even if it’s for a small credit towards the newer version.

I tend to take advantage of used books—just bc I buy so damn many—and O’Reilly never has any used books so figured maybe they did. Their books are great reference, so likely another factor in low inventory.

Loved the workflow of Ch 10’s code so far. Also love the new book design, nice change from the red border with animal theme.

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Aurélien Geron notifications@github.com Sent: Saturday, April 13, 2019 9:07 AM To: ageron/handson-ml Cc: Zach Mazz; Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

Cool, thanks for your kind words! I'm not sure it's possible to exchange edition 1 for edition 2, but perhaps O'Reilly offers that kind of service. I hope you'll enjoy the 2nd edition, I'm putting a lot of work into it right now. :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fissues%2F404%23issuecomment-482812214&data=02%7C01%7C%7C63c1fe4bad0d4b3c775208d6c0196537%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636907612629855183&sdata=kWEl01MrGzQ79VP8kYHyxTh9MTImpPxONrgVBdZlcWQ%3D&reserved=0, or mute the threadhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAt9BIL0JL_Q2GVTiQjGnDNKGqLD2Ytnyks5vgeStgaJpZM4ctyXw&data=02%7C01%7C%7C63c1fe4bad0d4b3c775208d6c0196537%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636907612629865191&sdata=JivOlci7RZgKbf41gHMNcCFKbXWecakrWO8A70El9hM%3D&reserved=0.

zoakes commented 5 years ago

Answered my own Question.

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Zach Oakes zach_oakes@outlook.com Sent: Saturday, April 13, 2019 9:24 AM To: ageron/handson-ml; ageron/handson-ml Cc: Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

Sorry, one unrelated question—does TF 2.0 have the Tensorboard features ? I was thinking that visualizing the process would likely be helpful for grasping neural networks.

Zach

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Zach Oakes zach_oakes@outlook.com Sent: Saturday, April 13, 2019 9:21 AM To: ageron/handson-ml; ageron/handson-ml Cc: Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

Yeah, they really should do something like that—even if it’s for a small credit towards the newer version.

I tend to take advantage of used books—just bc I buy so damn many—and O’Reilly never has any used books so figured maybe they did. Their books are great reference, so likely another factor in low inventory.

Loved the workflow of Ch 10’s code so far. Also love the new book design, nice change from the red border with animal theme.

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Aurélien Geron notifications@github.com Sent: Saturday, April 13, 2019 9:07 AM To: ageron/handson-ml Cc: Zach Mazz; Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

Cool, thanks for your kind words! I'm not sure it's possible to exchange edition 1 for edition 2, but perhaps O'Reilly offers that kind of service. I hope you'll enjoy the 2nd edition, I'm putting a lot of work into it right now. :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fissues%2F404%23issuecomment-482812214&data=02%7C01%7C%7C63c1fe4bad0d4b3c775208d6c0196537%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636907612629855183&sdata=kWEl01MrGzQ79VP8kYHyxTh9MTImpPxONrgVBdZlcWQ%3D&reserved=0, or mute the threadhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAt9BIL0JL_Q2GVTiQjGnDNKGqLD2Ytnyks5vgeStgaJpZM4ctyXw&data=02%7C01%7C%7C63c1fe4bad0d4b3c775208d6c0196537%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636907612629865191&sdata=JivOlci7RZgKbf41gHMNcCFKbXWecakrWO8A70El9hM%3D&reserved=0.

ageron commented 5 years ago

The easiest way to use TensorBoard is through Keras's TensorBoard callback:

from tensorflow import keras

callbacks = [keras.callbacks.TensorBoard("my_tf_logs_dir")]
model.fit(X_train, y_train, epochs=5, callbacks=callbacks)

In Jupyter, you can then type this:

%load_ext tensorboard.notebook
%tensorboard --logdir=my_tf_logs_dir

Hope this helps! Aurélien

zoakes commented 5 years ago

Aurelien,

I’m working on building a RNN, and I’m having an endless amount of trouble going between the book (1.13), the GitHub, stack exchange, and TF.comhttp://TF.com I think I finally figured it out, however there seems to be some bug that causes this to occur and stop the epochs.

ValueError: Arguments and signature arguments do not match: 31 33

I’m officially losing it over this pretty basic 4 layer RNN model.

I tried both ‘fixes’ — wrapping the input in a constant, and using the nightly build—do you know of any other solutions to this thing? Is there any way I could buy a digital copy of the 2nd edition ? Or just see the chapter on RNN’s so I can get through this damn model haha.

Zach

ageron commented 5 years ago

Hi @zoakes , I'm sorry you're struggling with RNNs, it takes a bit of getting used to, you'll get the hang of it! I recommend you start implementing it using tf.keras, it's the simplest option. Your training set needs to be a 3D array. For example, X_train should have a shape of [1000, 50, 10], meaning it contains 1,000 sequences, each 50 times steps long, and at each time step you have 10 input features (e.g., for a weather forecasting system, this could be the temperature, wind strength, humidity and 7 other relevant features (i.e., a total of 10 features) at each time step, and one sequence would be these 10 features measured every hour over the past 50 hours).

Here's a 4-layer sequence-to-vector RNN which will work in both TF 1.3 and TF 2.0. It outputs a single vector of size 3 for each input sequence.

from tensorflow import keras
import numpy as np

X_train = np.random.rand(1000, 50, 10).astype(np.float32) # replace with real inputs
y_train = np.random.rand(1000, 3).astype(np.float32)  # replace with real targets

model = keras.models.Sequential([
    keras.layers.LSTM(32, return_sequences=True),
    keras.layers.LSTM(32, return_sequences=True),
    keras.layers.LSTM(32, return_sequences=True),
    keras.layers.LSTM(32),
    keras.layers.Dense(3)
])
model.compile(loss="mse", optimizer="rmsprop")
model.fit(X_train, y_train, epochs=2)

And here's a sequence-to-sequence RNN. For each input sequence, it outputs a sequence of the same length as its inputs (i.e., 50 time steps), and with 3 values per time step.

from tensorflow import keras
import numpy as np

X_train = np.random.rand(1000, 50, 10).astype(np.float32) # replace with real inputs
Y_train = np.random.rand(1000, 50, 3).astype(np.float32)  # replace with real targets

model = keras.models.Sequential([
    keras.layers.LSTM(32, return_sequences=True),
    keras.layers.LSTM(32, return_sequences=True),
    keras.layers.LSTM(32, return_sequences=True),
    keras.layers.LSTM(32, return_sequences=True),
    keras.layers.Dense(3)
])
model.compile(loss="mse", optimizer="rmsprop")
model.fit(X_train, Y_train, epochs=2)

A general rule: don't forget to set return_sequences=True for all RNN layers, except perhaps for the very last one if you want a sequence-to-vector model.

If you want more TF2 and tf.keras code examples, please check out the jupyter notebooks at https://github.com/ageron/handson-ml2 . Alternatively, you can check out the RNN notebook in my TF2 course notebooks at: https://github.com/ageron/tf2_course

Hope this helps, Aurélien

zoakes commented 5 years ago

This is helpful—my input shape is 3D, it’s like [3805, 19, 4], and when I use that shape it says it’s received 4d expected 3d (for some reason it’s interpreting [None, 3805, 19, 4].

So I switched it to input_shape= [19, 4] (I read you need to cut instances somewhere on stack exchange) and it almost works. return_sequences is set to true on all but last. I’ll share the code to try to figure out what’s wrong with it, and I’m going to just try plugging in your model examples.

The errors couldn’t be less clear in TF haha.

Zach

zoakes commented 5 years ago

model = tf.keras.models.Sequential([ tf.keras.layers.GRU(100, return_sequences=True, #input_shape=[3807,19,4], dropout=0.2, recurrent_dropout=0.2), tf.keras.layers.GRU(100, return_sequences=True, dropout=0.2, recurrent_dropout=0.2), tf.keras.layers.Dense(4, activation="elu") ])

model.compile(loss="mse", optimizer="adam") history = model.fit(x_train, y_train, #steps_per_epoch=train_size // batch_size, epochs=100)

I’m now getting errors on mse needs matching input/output arrays, errors about np arrays vs tensors, I’m about ready to wait for the new book. I hate this model, no matter what I fix it breaks something else.

zoakes commented 5 years ago

ValueError: Do not pass inputs that mix Numpy arrays and TensorFlow tensors. You passed: x=tf.Tensor( [[[0.35666696 0.34942395 0.34168484 0.34297246] [0.33570223 0.33071358 0.32281607 0.31773628] [0.3197566 0.31941591 0.31218014 0.31884508]

This is the error.

zoakes commented 5 years ago

I’m getting the same error (below, np arrays and tensors) on both models you sent. x_train shape is 3807, 19, 4 x_valid/test is 476, 19, 4

ageron commented 5 years ago

Hi @zoakes , You're getting there, don't worry! :) The input_shape argument is meant to specify the shape of a single instance (e.g., [19, 4] in your case), not the shape of the whole training set. Keras assumes that the first dimension is None (i.e., could be value). There is also an argument called batch_input_shape that lets you specify the size of the batch as well (e.g., [32, 19, 4] if you are using a batch size of 32), but you should not use it or else the model will only accept batches of size 32 (you won't be able to make predictions for single instances). This is meant for stateful RNNs (but don't worry about these for now).

Also, you don't need to specify steps_per_epoch, as Keras can compute it on its own using the length of the training set divided by the batch size. This argument is useful when using a Dataset from tf.data (but don't worry about this now).

The error message seems to say that your inputs contain a mix of TensorFlow tensors and NumPy arrays. I'm not sure how you built x_train and y_train. Could you please tell me what you get when you run this code:

print("type(x_train) =", type(x_train))
print("type(x_train[0]) =", type(x_train[0]))
print("x_train.shape =", x_train.shape)
print("type(y_train) =", type(y_train))
print("type(y_train[0]) =", type(y_train[0]))
print("y_train.shape =", y_train.shape)

You want x_train to be either 100% NumPy array (not an array containing tensors), or 100% tensor. Same for y_train. Hope this helps.

zoakes commented 5 years ago

Okay, cool.

Yeah, a concerning error to be thrown hah.

I’ll send you the code for how I broke down the data—it’s equity price data, taken from online and just manually split. I borrowed the code from an online model; I usually would use the train_test_split, this just seemed like a clean repeatable function that I could use repeatedly(if it worked).

Here’s the data manipulation:

Scaling the data

def normalize_data(df): min_max_scaler = sklearn.preprocessing.MinMaxScaler() df['Open'] = min_max_scaler.fit_transform(df.Open.values.reshape(-1,1)) df['High'] = min_max_scaler.fit_transform(df.High.values.reshape(-1,1)) df['Low'] = min_max_scaler.fit_transform(df.Low.values.reshape(-1,1)) df['Close'] = min_max_scaler.fit_transform(df.Close.values.reshape(-1,1)) return df

df_stock_norm = df_stock.copy() df_stock_norm = normalize_data(df_stock_norm)

splitting the dataset into train test valid data

valid_size_pct = 10 test_size_pct = 10 seq_len = 20

def load_data(stock,seq_len): data_raw = stock.as_matrix() data = [] for index in range(len(data_raw)-seq_len): data.append(data_raw[index: index+seq_len]) data = np.array(data); valid_set_size = int(np.round(valid_size_pct/100data.shape[0])); test_set_size = int(np.round(test_size_pct/100data.shape[0])); train_set_size = data.shape[0] - (valid_set_size + test_set_size); x_train = data[:train_set_size,:-1,:] y_train = data[:train_set_size,-1,:] x_valid = data[train_set_size:train_set_size+valid_set_size,:-1,:] y_valid = data[train_set_size:train_set_size+valid_set_size,-1,:] x_test = data[train_set_size+valid_set_size:,:-1,:] y_test = data[train_set_size+valid_set_size:,-1,:] return [x_train,y_train,x_valid,y_valid,x_test,y_test]

x_train,y_train,x_valid,y_valid,x_test,y_test = load_data(df_stock_norm,seq_len)

Here’s the answer to your code:

type(x_train) = <class 'tensorflow.python.framework.ops.EagerTensor'> type(x_train[0]) = <class 'tensorflow.python.framework.ops.EagerTensor'> x_train.shape = (3807, 19, 4) type(y_train) = <class 'numpy.ndarray'> type(y_train[0]) = <class 'numpy.ndarray'> y_train.shape = (3807, 4)

The error makes sense! How do I turn the y_train into a tensor?

Zach

zoakes commented 5 years ago

In reviewing my code, it’s a complete mystery to me why the y_train is any different, it was taken in the exact same way… Is there some command equivalent to str() to float() to convert to TensorFlow object?

I found tf.convert_to_tensor, that may work?

zoakes commented 5 years ago

Used the tf.convert_to_tensor;

Looked like it would work, but it threw this now.

InvalidArgumentError: Incompatible shapes: [32,3] vs. [32,4] [[{{node training_6/Adam/gradients/loss_8/output_1_loss/mean_squared_error/SquaredDifference_grad/BroadcastGradientArgs}}]] [Op:__inference_keras_scratch_graph_52515]

I’m telling you—this is cursed.

ageron commented 5 years ago

Hi @zoakes , C'mon, you're almost there! :) This error is telling you that it could not compute the gradient of the loss (.../gradients/loss_8/output_1_loss/...) because some shapes did not match. Apparently the model's predictions had a last dimension of 3 while the targets had a last dimension of 4, or vice versa. If you're using Jupyter, this could be because it's still using some old model that you built earlier. Try restarting the kernel to get a clean environment, and make sure x_train and y_train have dimensions (3807, 19, 4) and (19, 4), assuming you have 4 features per time step in the input (open, high, low, close), and you are forecasting 4 values (open, high, low, close).

ageron commented 5 years ago

Also ensure your model's last layer has 4 neurons (i.e., one per output dimension).

zoakes commented 5 years ago

AH! I think that was it, last layer was Dense(3)… fingers crossed……shapes confirmed…. Error… for some reason fresh kernel gave me the x_train as np.array… weird. Converted both to tensors (do they need to be tensors, or can both be np.arrays in general if it’s consistent)

BAM ! THANK YOU!

Error is reducing…. Good signs !

I will let you know if any other problems come up, and I owe you immensely for helping me with this. I will purchase the new edition just to thank you!

Zach

zoakes commented 5 years ago

It worked !

Sorry… a bit embarrassing, but I don’t really know what to do with it now? Is that unusual?

The build I’m following plots y_test against y_pred—I really wish I had the new book. I can dig through the GitHub of 2.0, figure out what to do with my model.

Thank you so much for your help—that was quite frustrating.

Zach

ageron commented 5 years ago

Cool, congrats for sticking through it! Perseverance is key in ML. :)

Once your model is trained, you can evaluate it on the validation set. It will give you an idea of how good your model is. If it's not good enough for your needs, you can tweak the model architecture (e.g., change the number of layers, the number of neurons per layer, and so on), and train each variant on the training set, and evaluate them on the validation set. Then pick the best one, and train it one last time on the full training set + validation set (more data will give you better performance). Lastly, evaluate that final model on the test set to get an idea of the generalization error, that is how well you can expect it to work in production. And if you're happy, then deploy it to production to make predictions.

Use model.evaluate(X_valid, y_valid) to evaluate on the validation set. Use model.evaluate(X_test, y_test) to evaluate on the test set. Use model.predict(X_new) to make predictions on new sequences. Cheers!

zoakes commented 5 years ago

Nice!

It’s doing about 73% accuracy—I think that’s pretty damn good for financial time series Data, and ONLY daily data. I think once I include Tick/Minute data (taking my 10yr dataset from 3800 to just below 10M instances) I’ll get some great results maybe in the 80’s.

Zach

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Aurélien Geron notifications@github.com Sent: Monday, May 6, 2019 7:30 PM To: ageron/handson-ml Cc: Zach Mazz; Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

Cool, congrats for sticking through it! Perseverance is key in ML. :)

Once your model is trained, you can evaluate it on the validation set. It will give you an idea of how good your model is. If it's not good enough for your needs, you can tweak the model architecture (e.g., change the number of layers, the number of neurons per layer, and so on), and train each variant on the training set, and evaluate them on the validation set. Then pick the best one, and train it one last time on the full training set + validation set (more data will give you better performance). Lastly, evaluate that final model on the test set to get an idea of the generalization error, that is how well you can expect it to work in production. And if you're happy, then deploy it to production to make predictions.

Use model.evaluate(X_valid, y_valid) to evaluate on the validation set. Use model.evaluate(X_test, y_test) to evaluate on the test set. Use model.predict(X_new) to make predictions on new sequences. Cheers!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fissues%2F404%23issuecomment-489860034&data=02%7C01%7C%7C23f27adb15924d216f4d08d6d2833cd9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636927858427611014&sdata=6IoHa2zCSiWKwCRrTJxnqR%2B34H%2ByGLma5VByOYqzuKM%3D&reserved=0, or mute the threadhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FALPUCIG7SXDPWKMG7I36CYTPUDETDANCNFSM4HFXEXYA&data=02%7C01%7C%7C23f27adb15924d216f4d08d6d2833cd9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636927858427621019&sdata=3WUpJo%2BtVoEY3QeHDpU4Sz22sUW0mLeBES%2B43jObXmc%3D&reserved=0.

zoakes commented 5 years ago

And again—THANK YOU for pushing me through it hah.

Once you get one figured out it becomes so much easier. I literally think I built and trained 50 RNN’s today after that first one.

For the predict, what’s the x_new input ? Just a new sequence instance ? Or does it name the prediction x_new—no that would be y_new.

Zach

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Zach Oakes zach_oakes@outlook.com Sent: Monday, May 6, 2019 9:10 PM To: ageron/handson-ml; ageron/handson-ml Cc: Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

Nice!

It’s doing about 73% accuracy—I think that’s pretty damn good for financial time series Data, and ONLY daily data. I think once I include Tick/Minute data (taking my 10yr dataset from 3800 to just below 10M instances) I’ll get some great results maybe in the 80’s.

Zach

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Aurélien Geron notifications@github.com Sent: Monday, May 6, 2019 7:30 PM To: ageron/handson-ml Cc: Zach Mazz; Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

Cool, congrats for sticking through it! Perseverance is key in ML. :)

Once your model is trained, you can evaluate it on the validation set. It will give you an idea of how good your model is. If it's not good enough for your needs, you can tweak the model architecture (e.g., change the number of layers, the number of neurons per layer, and so on), and train each variant on the training set, and evaluate them on the validation set. Then pick the best one, and train it one last time on the full training set + validation set (more data will give you better performance). Lastly, evaluate that final model on the test set to get an idea of the generalization error, that is how well you can expect it to work in production. And if you're happy, then deploy it to production to make predictions.

Use model.evaluate(X_valid, y_valid) to evaluate on the validation set. Use model.evaluate(X_test, y_test) to evaluate on the test set. Use model.predict(X_new) to make predictions on new sequences. Cheers!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fissues%2F404%23issuecomment-489860034&data=02%7C01%7C%7C23f27adb15924d216f4d08d6d2833cd9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636927858427611014&sdata=6IoHa2zCSiWKwCRrTJxnqR%2B34H%2ByGLma5VByOYqzuKM%3D&reserved=0, or mute the threadhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FALPUCIG7SXDPWKMG7I36CYTPUDETDANCNFSM4HFXEXYA&data=02%7C01%7C%7C23f27adb15924d216f4d08d6d2833cd9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636927858427621019&sdata=3WUpJo%2BtVoEY3QeHDpU4Sz22sUW0mLeBES%2B43jObXmc%3D&reserved=0.

ageron commented 5 years ago

My pleasure, Zach! If you become rich with all these financial predictions, send me a check, haha! ;-)

Just one thing: you talk about "73% accuracy": as you probably know, accuracy is a standard metric for classification models (it's the ratio of correct predictions vs total number of predictions). But it seemed to me that you were predicting values (e.g., open=458.1, close=451.3), not classes (e.g. buy or sell). In other words, it seems to me that it's a regression model. So I'm not sure what you mean by 73% accuracy in this case. Did you check whether or not the predictions fell within some distance of the correct value (effectively turning the predictions into binary classes: "close to target" or "far from target".

The metric to use really depends on your use case. For example, if you are building a system that will forecast the market to make decisions on when to buy or sell, you might want to simulate the actual P&L you will get, and evaluate your final model's performance based on that P&L. For example, if a model makes very accurate predictions in general but sometimes makes a few very bad predictions, it may have a worse P&L than a model that generally makes less precise predictions, but never makes huge mistakes. This would encourage you to choose the second model, rather than the first one that seemed better at first.

For regression, you generally want no activation function on the last layer (I noticed you used ELU, which will restrict the output values to values above -1). If you want to constrain the outputs to be positive, then you can use the softplus activation function. If you want to constrain to a range of values, you can use tanh which outputs values between -1 and 1, and rescale the targets (and preditions) to the appropriate range. You also want to use the MSE cost function, in general, or the Huber loss if you don't want to be too impacted by outliers. For the metric, you can use the MSE, or other metrics depending on what you care about (e.g., the P&L).

For binary classification (2 classes), use a single neuron in the output layer, using the sigmoid activation function, and use the binary_crossentropy loss. Use the accuracy metric if the classes are balanced (roughly 50/50), or the precision & recall metrics if they are not.

For multiclass classification (3 classes or more), use one neuron per class in the output layer and use the softmax activation function, and use the sparse_categorical_crossentropy loss (assuming the labels are just class indices, one per instance), or the categorical_crossentropy loss (assuming the labels are one target probability per class and per instance).

Regarding X_new, that's the new data you want to make predictions on. For example, say you trained an image classification model on 10,000 images, then you take a new picture and you want to classify it, all you need to do is put this new image in the correct format (i.e., a numpy array of the right shape and type, preprocessed just like the training set), and you can use the model to make predictions on it.

Cheers!

zoakes commented 5 years ago

Most definitely will!

I was actually thinking about the same thing—what IS accuracy. I simply added it as a metric as I’d seen in many other NN’s, but I honestly have no idea. I didn’t design anything complex like your suggestions, but I would like to. I agree I think the P&L would be ideal accuracy measure, (and even considered building a reinforcement learning model based on this), and in the past I’ve used binary classification model, (up/down). This model is supposed to predict the next bar—so with daily bars it should be predicting the next days open/high/low/close. Since markets have shown scalar similarities regardless of interval (day vs minute)—my thoughts were to train a model using minute data, then predict using hourly or daily bars so the potential gain could be worthwhile.

To add the softplus activation to the final dense layer, do I just add activation=‘softplus’ ? I did constrain ranges from -1 to 1, so maybe I should be using tanh—although you said I don’t want an activation on final layer for regression. For accuracy, does that mean I add: metrics=[‘MSE’] ?

I am outputting 4 classes(?) 4 outputs, so should I be using softmax? (Just add activation=‘softmax’ ? To final layer w/ 4 nodes) And instead of loss of MSE I should use sparse_categorical_crossentropy’ ?

The labels are features of a single instance (a price bar)—from your description I think that means labels are class indices.

This email alone was INVALUABLE—I wish they had more of those ‘choose a model’ charts like the sklearn one where it helps you decide all the parameters; loss, activation, etc.

I’ve also noticed the deeper I make the network, the less accurate it is ? (Although this email explains why my accuracy is very wrong hah) so maybe after I make these changes I’ll retrain and evaluate.

Final changes: Sounds like since I’m using 4 outputs I should be using softmax, with a sparse loss, and MSE as a metric ? For two metrics does it become [‘accuracy’, ‘mse’] or [‘mse’][‘acc’] ?

Zach

zoakes commented 5 years ago

In first attempts; sparse_categorical throws an error with softmax:

InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [32,4] and labels shape [128] [[{{node loss_18/output_1_loss/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] [Op:__inference_keras_scratch_graph_1186890]

Softmax with categorical_crossentropy it works, uses MSE & Accuracy. It doesn’t seem to improve the MSE at all using Adam—maybe I need to use something less stochastic; ADAgrad, SGD Mom or RMSProp ? Here’s the results with cat cross and MSE / Acc

loss: 3.4853 - accuracy: 0.1786 - MSE: 0.1438

Not sure what happened—it’s almost the same model that was getting > 70’s, just with softmax and different loss function.

In the article I was initially following, they plotted the y_pred vs y_actual—so I could do that to at least visualize the accuracy.

That would require x_new, but in order to plot it for multiple values how would I do that? Like x_new would give me one value—Is there a way to plot the continuous predictions over like 1yr?

I suppose I could manually do this with a loop, something like; For i in range(252): vals = [] plot = predict(x_new) i + 1 x_new = x_new[i-20:i] vals.append(plot)

Then simply plot the vals list as the y axis, time as x? This may require more panda’ing—something like df.loc / iloc

I built out the x_new, using current data (19+) and I’m going to try it out on the current data. Something a bit concerning, is that when I built the new dataset — using 4 months of data, it had more instances somehow?

The shape is 4759,19,4 for 2019-1-1 to 2019-4-1

Something about the initial 10yr split data function must be wrong, because I’ve checked the entire (new) dataset and it seems accurate.

Zach

zoakes commented 5 years ago

Well, I’ve had some difficulty looping over a scalar that’s for sure.

So far, the best graph I can get is this: [cid:E5D741C4-6920-4ACA-9135-C19B44CAAE96@lan]

However, my MSE is much lower then the article I’m following, and their graph with GRU looks pretty good. I have NO idea how they graphed it, throws like 6 different errors when I try to build the df as they did with y_new and y_test, but the results are promising visually.

Here’s my latest attempt at looping through predictions—looks concise but doesn’t work.

for i in x_new: vals = [] y_n = model_gru_nd.predict(i) vals.append[y_n]

Not having much luck with the alternate models… any suggestions on how I could improve this (beyond using minute data—which I’m shopping for) ?

Best so far is GRU with dropout, averages around 75% accuracy (unsure of what that’s saying… but assume it measures accuracy of the OHLC values)

Zach

ageron commented 5 years ago

Wow, that's a lot questions, pretty soon I'll have to send you an invoice! ;-) See my answers below.

To add the softplus activation to the final dense layer, do I just add activation=‘softplus’ ?

Yes

I did constrain ranges from -1 to 1, so maybe I should be using tanh—although you said I don’t want an activation on final layer for regression.

If you constrained the targets to -1 to 1, then yes you should probably be using tanh. That's just activation='tanh'. The rule about regression without activation is only if the targets are unconstrained values.

For accuracy, does that mean I add: metrics=[‘MSE’] ?

Yes. But I wouldn't call it accuracy, as it is a standard metric in ML. It's better to talk about a "metric". Loss = the value that will be minimized during training. It must play nicely with Gradient Descent, and its goal is just to get the model to learn. It includes any extra loss you add to the model, such as L2 regularization. This is not the actual metric you care about, it's just the tool that is used to train a model that will fit the training data and hopefully generalize well to new examples. Metric = the value you really care about in the end, such as the P&L. There's no universal good answer for what metric to use. It does not need to play nicely with gradient descent. It does not include regularization losses. Usually, you want something that a human can interpret. In some cases the loss and the metric can be the same, such as the MSE. But often, they are different. For example, you may want to use the square root of the MSE instead of the MSE as the metric, it makes for a more interpretable metric (roughly the same scale as the targets).

I am outputting 4 classes(?) 4 outputs, so should I be using softmax? (Just add activation=‘softmax’ ? To final layer w/ 4 nodes) And instead of loss of MSE I should use sparse_categorical_crossentropy’ ?

No, you are doing regression, so the activation function will be None, sigmoid or tanh (in general), and the loss will be MSE or Huber loss (in general). If you were doing classification, and there were 4 classes (like "large drop in prices", "small drop in prices", "slight increase", "large increase"), then the answer would be yes.

The labels are features of a single instance (a price bar)—from your description I think that means labels are class indices.

No, the labels (aka targets) are just values in your case, not classes. It's regression, not classification. So you can use MSE or Huber loss, and therefore you don't need to care about sparse_categorical_crossentropy vs categorical_crossentropy. This would only be if you were doing classification. For example, sparse labels would look like this: [[1], [3], [0], [0]], meaning that the first instance belongs to class #1, the second belongs to class #3, and the other 2 belong to class #0. Then you would use loss="sparse_categorical_crossentropy". But another way to represent the same labels would be as one-hot vectors: [[0, 1, 0, 0], [0, 0, 0, 1], [1, 0, 0, 0], [1, 0, 0, 0]]. In this case, you would use loss="categorical_crossentropy".

I’ve also noticed the deeper I make the network, the less accurate it is ? (Although this email explains why my accuracy is very wrong hah) so maybe after I make these changes I’ll retrain and evaluate.

This can happen, it's not a problem. It just means that deeper models are too powerful relative to the task at hand and the amount of available data. If the data is very noisy and you don't have a lot of it, it's quite likely that the best model will a very simple model. It will be less likely to overfit the training data.

Sounds like since I’m using 4 outputs I should be using softmax, with a sparse loss, and MSE as a metric ?

No, softmax is when you're doing multiclass classification. In your case it's regression. See my answer above.

For two metrics does it become [‘accuracy’, ‘mse’] or [‘mse’][‘acc’] ?

Technically, it would be metrics=['accuracy', 'mse'], but it would not make much sense, as accuracy is a classification metric, while mse is a regression metric.

In first attempts; sparse_categorical throws an error with softmax: InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [32,4] and labels shape [128]

You should not use sparse_categorical_crossentropy or categorical_crossentropy for regression.

Softmax with categorical_crossentropy it works, uses MSE & Accuracy. It doesn’t seem to improve the MSE at all using Adam—maybe I need to use something less stochastic; ADAgrad, SGD Mom or RMSProp ?

It doesn't show an error, but that does not mean it's correct: it's actually interpreting your labels as class probabilities, which they are not, so it's not really learning anything useful.

they plotted the y_pred vs y_actual [...] That would require x_new, but in order to plot it for multiple values how would I do that? Like x_new would give me one value—Is there a way to plot the continuous predictions over like 1yr?

Once your model is trained (e.g., using loss="mse" and activation='tanh' in the last layer, assuming the targets are normalized to the -1 to 1 range), you can try this (untested):

y_pred = model.predict(X_valid)
dim = 0
plt.plot(y_pred[:, dim], y_valid[:, dim], ".")
plt.show()

This will make predictions for all instances in the validation set (if you don't have one, you should make one; or just use the training set or the test set for now).

Then simply plot the vals list as the y axis, time as x? This may require more panda’ing—something like df.loc / iloc

See above.

I built out the x_new, using current data (19+) and I’m going to try it out on the current data. Something a bit concerning, is that when I built the new dataset — using 4 months of data, it had more instances somehow?

I'm not sure I understand your question. A model is usually trained on a large dataset (X_train), evaluated on a fairly large validation set (X_valid, for model selection) and test set (X_test, to evaluate the generalization error you will get in production), and then it is used to make predictions on any number of instances (X_new), often just 1 or a few at a time. That's fine.

Side note: by convention we use capital letters to represent matrices (or multi-dimensional arrays with 2 or more dimensions). So it's generally X_train, X_valid, X_test, X_new, and not x_train, x_valid, x_test, x_new. One exception is when we use a 2D array to hold a column vector or a row vector (i.e., a matrix with a single column or row), just for convenience. In this case, they are generally named as if they were vectors, with a lowercase letter, such as y_train, y_valid, y_test, y_new, even if their shape is [1000, 1].

The shape is 4759,19,4 for 2019-1-1 to 2019-4-1

That looks good to me.

So far, the best graph I can get is this: [cid:E5D741C4-6920-4ACA-9135-C19B44CAAE96@lan]

Github will not show images sent by email. Please connect to https://github.com/ageron/handson-ml/issues/404 to comment on this issue, it makes the code easier to read, and you can just drag & drop images you want to show.

However, my MSE is much lower then the article I’m following, and their graph with GRU looks pretty good. I have NO idea how they graphed it, throws like 6 different errors when I try to build the df as they did with y_new and y_test, but the results are promising visually.

The scale of the MSE depends on the task at hand and the way the labels were scaled. E.g., if they scaled their labels from 0 to 1000, then they will have a much larger MSE than you, even if their model is much better. Unless you know that they scaled everything the same way and they used the same data, you can't really compare MSEs.

Not having much luck with the alternate models… any suggestions on how I could improve this (beyond using minute data—which I’m shopping for) ?

Financial data is super noisy. One way to reduce the noise is to average over many stocks. You could try non-recurrent models, e.g., using convolutional neural networks, such as the wavenet model (see chapter 15 in the 2nd edition of my book).

Best so far is GRU with dropout, averages around 75% accuracy (unsure of what that’s saying… but assume it measures accuracy of the OHLC values)

Yes, the accuracy might not mean much. You want to replace this with the MSE, or the MAE (mean absolute error), or the Huber loss, or the P&L, or whatever other metric you care about.

Hope this helps!

zoakes commented 5 years ago

Yea you could literally bill me for that email, no doubt. I didn’t realize how many questions I asked—my apologies.

I definitely understand now! I didn’t even realize this was a regression—clearly. It was just pretty challenging learning while going back and forth between both TF versions, they’re very different with the whole model build and training.

I think I’m going to try a CNN, that’s a solid idea. Sorry—but when you say more stock data you mean more data of this stock—yeah? Otherwise I wouldn’t know where to start combining them/normalizing into one dataset—the amount of pandas merging/joining is making me nauseous. I think my best bet is to acquire tick data (currently using daily data), which would give me ~ 2k x more instances per day.

Thank you for helping me plot this ! I tried alot of combinations of different things. Much appreciated.

Yep, I actually typo’d the x then just left it—I think you’d be pleased I took a linear algebra refresher course before reading just so I wasn’t lost with all the matrix calculations involved with the activations and various optimization functions.

I’m going to give the CNN a try, Thanks again for your help!

Zach

zoakes commented 5 years ago

Thought I would include this ! Not too shabby, no?

Thanks again !

Zach

[cid:53738AD4-B8A1-4112-B333-4FDE5CBB8ACA@lan]

On May 8, 2019, at 12:00 AM, Aurélien Geron notifications@github.com<mailto:notifications@github.com> wrote:

y_pred = model.predict(X_valid) dim = 0 plt.plot(y_pred[:, dim], y_valid[:, dim], ".") plt.show()

zoakes commented 5 years ago

Hate to do this, but is there anywhere I can find sequence networks? CNNs—every example I can find is a classification problem.

I’m trying to work off the TF alpha examples, I’ve tried every possible kernel/no kernels, input shapes, no input shapes, rely/elu and no activation, various dimension convul layers.

Is there like a general rule of thumb for figuring out what these networks need as inputs/outputs—I feel like their all the same shape of training data but somehow I get caught up with shapes/nodes.

Current error (in me trying combinations of input #’s) are these:

InvalidArgumentError: Incompatible shapes: [32,1,32] vs. [32,4] [[{{node metrics_2/mse/SquaredDifference}}]] [Op:__inference_keras_scratch_graph_3004]

Or

ValueError: Negative dimension size caused by subtracting 4 from 2 for 'conv1d_22/conv1d' (op: 'Conv2D') with input shapes: [?,1,2,32], [1,4,32,32].

The base model was a 2d, I realized from looking at a few 1.13 examples I needed 1d, so modified it to this: Seems to be an error with the number of nodes, or the weights/bias/stride whatever the input is.

model_cnn1.add(keras.layers.Conv1D(32, (4), activation='relu')) model_cnn1.add(keras.layers.MaxPooling1D((2))) model_cnn1.add(keras.layers.Conv1D(32, (4), activation='relu')) model_cnn1.add(keras.layers.MaxPooling1D((2))) model_cnn1.add(keras.layers.Conv1D(32, (2), activation='relu’))

It’s 100% fine if you send me a bill haha I get it.

Zach

On May 8, 2019, at 12:00 AM, Aurélien Geron notifications@github.com<mailto:notifications@github.com> wrote:

Wow, that's a lot questions, pretty soon I'll have to send you an invoice! ;-) See my answers below.

To add the softplus activation to the final dense layer, do I just add activation=‘softplus’ ?

Yes

I did constrain ranges from -1 to 1, so maybe I should be using tanh—although you said I don’t want an activation on final layer for regression.

If you constrained the targets to -1 to 1, then yes you should probably be using tanh. That's just activation='tanh'. The rule about regression without activation is only if the targets are unconstrained values.

For accuracy, does that mean I add: metrics=[‘MSE’] ?

Yes. But I wouldn't call it accuracy, as it is a standard metric in ML. It's better to talk about a "metric". Loss = the value that will be minimized during training. It must play nicely with Gradient Descent, and its goal is just to get the model to learn. It includes any extra loss you add to the model, such as L2 regularization. This is not the actual metric you care about, it's just the tool that is used to train a model that will fit the training data and hopefully generalize well to new examples. Metric = the value you really care about in the end, such as the P&L. There's no universal good answer for what metric to use. It does not need to play nicely with gradient descent. It does not include regularization losses. Usually, you want something that a human can interpret. In some cases the loss and the metric can be the same, such as the MSE. But often, they are different. For example, you may want to use the square root of the MSE instead of the MSE as the metric, it makes for a more interpretable metric (roughly the same scale as the targets).

I am outputting 4 classes(?) 4 outputs, so should I be using softmax? (Just add activation=‘softmax’ ? To final layer w/ 4 nodes) And instead of loss of MSE I should use sparse_categorical_crossentropy’ ?

No, you are doing regression, so the activation function will be None, sigmoid or tanh (in general), and the loss will be MSE or Huber loss (in general). If you were doing classification, and there were 4 classes (like "large drop in prices", "small drop in prices", "slight increase", "large increase"), then the answer would be yes.

The labels are features of a single instance (a price bar)—from your description I think that means labels are class indices.

No, the labels (aka targets) are just values in your case, not classes. It's regression, not classification. So you can use MSE or Huber loss, and therefore you don't need to care about sparse_categorical_crossentropy vs categorical_crossentropy. This would only be if you were doing classification. For example, sparse labels would look like this: [[1], [3], [0], [0]], meaning that the first instance belongs to class #1https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fissues%2F1&data=02%7C01%7C%7C06d0b3d5ee3245d7221f08d6d3721633%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636928884278884346&sdata=lx00rl8oIu1T4isOIUAgdngHBMPCPhXTcKV26PU6mmE%3D&reserved=0, the second belongs to class #3https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fpull%2F3&data=02%7C01%7C%7C06d0b3d5ee3245d7221f08d6d3721633%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636928884278894339&sdata=Ngul0R50GoCaVY7QJ5yih5ad%2BsUJYdBRRGTyF608kAc%3D&reserved=0, and the other 2 belong to class #0. Then you would use loss="sparse_categorical_crossentropy". But another way to represent the same labels would be as one-hot vectors: [[0, 1, 0, 0], [0, 0, 0, 1], [1, 0, 0, 0], [1, 0, 0, 0]]. In this case, you would use loss="categorical_crossentropy".

I’ve also noticed the deeper I make the network, the less accurate it is ? (Although this email explains why my accuracy is very wrong hah) so maybe after I make these changes I’ll retrain and evaluate.

This can happen, it's not a problem. It just means that deeper models are too powerful relative to the task at hand and the amount of available data. If the data is very noisy and you don't have a lot of it, it's quite likely that the best model will a very simple model. It will be less likely to overfit the training data.

Sounds like since I’m using 4 outputs I should be using softmax, with a sparse loss, and MSE as a metric ?

No, softmax is when you're doing multiclass classification. In your case it's regression. See my answer above.

For two metrics does it become [‘accuracy’, ‘mse’] or [‘mse’][‘acc’] ?

Technically, it would be metrics=['accuracy', 'mse'], but it would not make much sense, as accuracy is a classification metric, while mse is a regression metric.

In first attempts; sparse_categorical throws an error with softmax: InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [32,4] and labels shape [128]

You should not use sparse_categorical_crossentropy or categorical_crossentropy for regression.

Softmax with categorical_crossentropy it works, uses MSE & Accuracy. It doesn’t seem to improve the MSE at all using Adam—maybe I need to use something less stochastic; ADAgrad, SGD Mom or RMSProp ?

It doesn't show an error, but that does not mean it's correct: it's actually interpreting your labels as class probabilities, which they are not, so it's not really learning anything useful.

they plotted the y_pred vs y_actual [...] That would require x_new, but in order to plot it for multiple values how would I do that? Like x_new would give me one value—Is there a way to plot the continuous predictions over like 1yr?

Once your model is trained (e.g., using loss="mse" and activation='tanh' in the last layer, assuming the targets are normalized to the -1 to 1 range), you can try this (untested):

y_pred = model.predict(X_valid) dim = 0 plt.plot(y_pred[:, dim], y_valid[:, dim], ".") plt.show()

This will make predictions for all instances in the validation set (if you don't have one, you should make one; or just use the training set or the test set for now).

Then simply plot the vals list as the y axis, time as x? This may require more panda’ing—something like df.loc / iloc

See above.

I built out the x_new, using current data (19+) and I’m going to try it out on the current data. Something a bit concerning, is that when I built the new dataset — using 4 months of data, it had more instances somehow?

I'm not sure I understand your question. A model is usually trained on a large dataset (X_train), evaluated on a fairly large validation set (X_valid, for model selection) and test set (X_test, to evaluate the generalization error you will get in production), and then it is used to make predictions on any number of instances (X_new), often just 1 or a few at a time. That's fine.

Side note: by convention we use capital letters to represent matrices (or multi-dimensional arrays with 2 or more dimensions). So it's generally X_train, X_valid, X_test, X_new, and not x_train, x_valid, x_test, x_new. One exception is when we use a 2D array to hold a column vector or a row vector (i.e., a matrix with a single column or row), just for convenience. In this case, they are generally named as if they were vectors, with a lowercase letter, such as y_train, y_valid, y_test, y_new, even if their shape is [1000, 1].

The shape is 4759,19,4 for 2019-1-1 to 2019-4-1

That looks good to me.

So far, the best graph I can get is this: [cid:E5D741C4-6920-4ACA-9135-C19B44CAAE96@lan]

Github will not show images sent by email. Please connect to #404https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fissues%2F404&data=02%7C01%7C%7C06d0b3d5ee3245d7221f08d6d3721633%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636928884278904338&sdata=F7wWj6KZrtEbjlDwUOKJPP6BorJPdVeASuUiifAR%2FfU%3D&reserved=0 to comment on this issue, it makes the code easier to read, and you can just drag & drop images you want to show.

However, my MSE is much lower then the article I’m following, and their graph with GRU looks pretty good. I have NO idea how they graphed it, throws like 6 different errors when I try to build the df as they did with y_new and y_test, but the results are promising visually.

The scale of the MSE depends on the task at hand and the way the labels were scaled. E.g., if they scaled their labels from 0 to 1000, then they will have a much larger MSE than you, even if their model is much better. Unless you know that they scaled everything the same way and they used the same data, you can't really compare MSEs.

Not having much luck with the alternate models… any suggestions on how I could improve this (beyond using minute data—which I’m shopping for) ?

Financial data is super noisy. One way to reduce the noise is to average over many stocks. You could try non-recurrent models, e.g., using convolutional neural networks, such as the wavenet model (see chapter 15 in the 2nd edition of my book).

Best so far is GRU with dropout, averages around 75% accuracy (unsure of what that’s saying… but assume it measures accuracy of the OHLC values)

Yes, the accuracy might not mean much. You want to replace this with the MSE, or the MAE (mean absolute error), or the Huber loss, or the P&L, or whatever other metric you care about.

Hope this helps!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fissues%2F404%23issuecomment-490347725&data=02%7C01%7C%7C06d0b3d5ee3245d7221f08d6d3721633%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636928884278914349&sdata=I8jcL0siL1iXKF4ZjduJ%2FDJ5QM5GThecKJZLkPPNePI%3D&reserved=0, or mute the threadhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FALPUCICMIL2RR43AN54CXY3PUJM6VANCNFSM4HFXEXYA&data=02%7C01%7C%7C06d0b3d5ee3245d7221f08d6d3721633%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636928884278924355&sdata=ZB%2BB372sx5dFSsiejf2mtoPqjapSZ5yOb4O8x51Sz7E%3D&reserved=0.

ageron commented 5 years ago

Hi Zach,

Check out this paper: https://arxiv.org/abs/1711.04837 It explains how to they forecasted company fundamentals using an RNN. They also explain all the preprocessing. They trained on over ten thousand different stocks, and they explain all the normalization they did. A stock behaves very differently depending on the scale. The fine-grain, short-term scale is super noisy, while the long-term, coarse scale has much more signal, especially if you aggregate over many stocks.

Regarding the image you sent in your last comment, github doesn't display it since it was sent via email, you need to connect to https://github.com/ageron/handson-ml/issues/404 to drag&drop it on a new comment.

Hope this helps!

zoakes commented 5 years ago

This is interesting—it’s very different from my general Quant approach (I’m mostly technical/arb focused probably considered HFT based on the average intervals). Yep—most of my returns are made on the short term noise—so I can’t exactly get mad at it—but it sucks for training ML models.

I’m thinking a CNN was a great idea, will help with the noise and overfitting, as it seems that was an issue with vaguely deep Recurrent nn’s. I’m just trying to figure out what all these inputs are and why the layers won’t accept my input data, I’m guessing there’s some ratio/relationship required that I’m missing between the layers.

Zach

ageron commented 5 years ago

Perhaps check out Wavenets in notebook 15 (in the second edition). Its a stack of 1D ConvNets, with exponentially growing dilation. There's a code example.

zoakes commented 5 years ago

I reviewed it, it’s pretty confusing for me with the lambdas and metric functions, but it taught me a few inputs so I at least know what numbers I’m trying. I’ve tried using an input layer first, didn’t seem to change anything. it seems the first argument is filter and not nodes, then stride for maxpooling, I don’t know what I need for either. For some reason it’s saying my shapes are incompatible with any combination I’ve tried (closest I’ve gotten is 32,4,4 vs 32, 4)—don’t know how to lose a dimension for whatever it’s referring to in the CNN. I’ve actually been using try except loops to just plug in ranges of numbers and I think I’m close haha—guess that’s one way to do it.

I’m posting it on stack overflow, I give up and you’ve helped more than enough. I really appreciate it !

Zach

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Aurélien Geron notifications@github.com Sent: Saturday, May 11, 2019 8:30 AM To: ageron/handson-ml Cc: Zach Mazz; Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

Perhaps check out Wavenetshttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeepmind.com%2Fblog%2Fwavenet-generative-model-raw-audio%2F&data=02%7C01%7C%7C041e452df8b44f1427bf08d6d614e040%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636931782473571119&sdata=c6FW91JgT9AN00xRjT5c6blBJNUnqhtFAkyqWkfSQzs%3D&reserved=0 in notebook 15https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml2%2Fblob%2Fmaster%2F15_processing_sequences_using_rnns_and_cnns.ipynb&data=02%7C01%7C%7C041e452df8b44f1427bf08d6d614e040%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636931782473581122&sdata=e0aG0iuOSW07EpOPFgHcOdsSvy%2Br8bqY9%2FbdlNidNDw%3D&reserved=0 (in the second edition). Its a stack of 1D ConvNets, with exponentially growing dilation. There's a code example.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fissues%2F404%23issuecomment-491511301&data=02%7C01%7C%7C041e452df8b44f1427bf08d6d614e040%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636931782473581122&sdata=K5uJ2EFecJ%2Br0iKTP4b4FxZFU8rk72mq1HKgcxIyu1o%3D&reserved=0, or mute the threadhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FALPUCIGCLPAABIXESQCJFZLPU3DANANCNFSM4HFXEXYA&data=02%7C01%7C%7C041e452df8b44f1427bf08d6d614e040%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636931782473591131&sdata=8%2BUkX8AzPcCQfx%2FwZOvN4rfmSp3dQ%2Fph%2BF9HsW3c9F0%3D&reserved=0.

zoakes commented 5 years ago

I figured out CNNs ! Strangely enough it was using a 2D CNN, I just switched everything to 1D. I have no idea what I was missing in my original attempt, maybe the Flatten() layer, or the increasing filter size/decreasing stride, or SAME padding—but it now works ! (And I can deduce what fixed it by taking things away)

I’m going to try building 1D versions of the well known CNN’s you mentioned in the book—like Googles CNN etc.

Thanks for ALL of your help—and luckily I believe I’m done bothering you with errors !

Zach

On May 11, 2019, at 9:47 PM, Zach Oakes zach_oakes@outlook.com<mailto:zach_oakes@outlook.com> wrote:

I reviewed it, it’s pretty confusing for me with the lambdas and metric functions, but it taught me a few inputs so I at least know what numbers I’m trying. I’ve tried using an input layer first, didn’t seem to change anything. it seems the first argument is filter and not nodes, then stride for maxpooling, I don’t know what I need for either. For some reason it’s saying my shapes are incompatible with any combination I’ve tried (closest I’ve gotten is 32,4,4 vs 32, 4)—don’t know how to lose a dimension for whatever it’s referring to in the CNN. I’ve actually been using try except loops to just plug in ranges of numbers and I think I’m close haha—guess that’s one way to do it.

I’m posting it on stack overflow, I give up and you’ve helped more than enough. I really appreciate it !

Zach

Sent from Outlookhttps://aka.ms/qtex0l for iOS


From: Aurélien Geron notifications@github.com<mailto:notifications@github.com> Sent: Saturday, May 11, 2019 8:30 AM To: ageron/handson-ml Cc: Zach Mazz; Mention Subject: Re: [ageron/handson-ml] Tensorflow--nothing works, runtime error on everything. (#404)

Perhaps check out Wavenetshttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeepmind.com%2Fblog%2Fwavenet-generative-model-raw-audio%2F&data=02%7C01%7C%7C041e452df8b44f1427bf08d6d614e040%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636931782473571119&sdata=c6FW91JgT9AN00xRjT5c6blBJNUnqhtFAkyqWkfSQzs%3D&reserved=0 in notebook 15https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml2%2Fblob%2Fmaster%2F15_processing_sequences_using_rnns_and_cnns.ipynb&data=02%7C01%7C%7C041e452df8b44f1427bf08d6d614e040%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636931782473581122&sdata=e0aG0iuOSW07EpOPFgHcOdsSvy%2Br8bqY9%2FbdlNidNDw%3D&reserved=0 (in the second edition). Its a stack of 1D ConvNets, with exponentially growing dilation. There's a code example.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fageron%2Fhandson-ml%2Fissues%2F404%23issuecomment-491511301&data=02%7C01%7C%7C041e452df8b44f1427bf08d6d614e040%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636931782473581122&sdata=K5uJ2EFecJ%2Br0iKTP4b4FxZFU8rk72mq1HKgcxIyu1o%3D&reserved=0, or mute the threadhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FALPUCIGCLPAABIXESQCJFZLPU3DANANCNFSM4HFXEXYA&data=02%7C01%7C%7C041e452df8b44f1427bf08d6d614e040%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636931782473591131&sdata=8%2BUkX8AzPcCQfx%2FwZOvN4rfmSp3dQ%2Fph%2BF9HsW3c9F0%3D&reserved=0.

ageron commented 5 years ago

Hi Zach, It's very rewarding to me to see you succeed and fly on you own now! Congratulations for your hard work and perseverance! Cheers, Aurélien