Closed CSVdB closed 6 years ago
The range of tanh
is (-1,1)
, and it's quite unstable around 0.
I would certainly first replace Tanh
with a sigmoid activation (which I foolishly called Logit
).
With Logit layers instead of Tanh, and learning rate 5e-4, this is the output.
Before training: S1D (0.5209612495165322 :: R 1) S1D (0.5354862396464221 :: R 1) S1D (0.5147895724365512 :: R 1) S1D (0.5286280277690646 :: R 1) After training: S1D (0.49562255798747007 :: R 1) S1D (0.5116790380511227 :: R 1) S1D (0.48814828705706076 :: R 1) S1D (0.5032045153404426 :: R 1)
Again all the outcomes simply went down, with approximately the same amount (0.02 - 0.03), instead of training properly.
Any other ideas?
I changed a few things, using Tanh then Logit, regularising, and changing the number of passes.
type Net
= Network '[ FullyConnected 2 2, Tanh, FullyConnected 2 1, Logit] '[ 'D1 2, 'D1 2, 'D1 2, 'D1 1, 'D1 1]
main :: IO ()
main = do
let samples = take 500000 $ cycle $ zip inputs outputs
params = LearningParameters 0.005 1e-8 1e-8
net <- randomNetworkM
putStrLn "Before training:"
print $ snd $ runNetwork net $ S1D $ vec2 0 0
print $ snd $ runNetwork net $ S1D $ vec2 0 1
print $ snd $ runNetwork net $ S1D $ vec2 1 0
print $ snd $ runNetwork net $ S1D $ vec2 1 1
let trained =
foldl'
(\net (inpt, outpt) -> train params net inpt outpt)
net
samples
putStrLn "After training:"
print $ snd $ runNetwork trained $ S1D $ vec2 0 0
print $ snd $ runNetwork trained $ S1D $ vec2 0 1
print $ snd $ runNetwork trained $ S1D $ vec2 1 0
print $ snd $ runNetwork trained $ S1D $ vec2 1 1
Gives
>> :main
Before training:
S1D (0.3277539706087074 :: R 1)
S1D (0.40347581438397084 :: R 1)
S1D (0.21913306200165242 :: R 1)
S1D (0.26255544780363543 :: R 1)
After training:
S1D (2.508754710817976e-2 :: R 1)
S1D (0.9678709914342344 :: R 1)
S1D (0.9677786822228179 :: R 1)
S1D (2.218035165797253e-2 :: R 1)
I'm experiencing a bug and can't trace where it's coming from: my networks aren't actually training, they just seem to randomly change the network parameters. I simplified my code until now, all I'm using are the 'randomNetwork' and 'train' functions, yet the bug seems to persist.
Details: You can find the code for the bug on https://github.com/Nickske666/grenade-examples/tree/bug in app/main.hs. This executable only depends on your grenade (master branch, latest version). I'm trying to train a two layer fully connected NN to approximate XOR. Here is the output before and after 100000 train-loops:
This shows the network's predictions on the vectors [0, 0], [0, 1], [1, 0] and [1, 1], which should be 0 1 1 0. However, as you can see, it's not even close. I turned off the momentum and regulator for this, and optimised the learning rate.
Is this a bug, or did I make a mistake here?