HuwCampbell / grenade

Deep Learning in Haskell
BSD 2-Clause "Simplified" License
1.44k stars 84 forks source link

Weight Initialization, GNum class and NFData instances #69

Open schnecki opened 6 years ago

schnecki commented 6 years ago

see https://github.com/HuwCampbell/grenade/issues/51

HuwCampbell commented 5 years ago

Sorry, I'll get round to this soon.

schnecki commented 5 years ago

No worries, it took me quite some time as well. Btw, I checked the implemented learning algorithm and it seems to me that this is SGD with momentum and regularization. Is this correct?

schnecki commented 5 years ago

No def not. It would be nice if you can improve here. I try to manage to free up some time to help you out if you need any help. Btw prior to merging we should probably also add tests.

HuwCampbell commented 5 years ago

Go for it @erikd

erikd commented 4 years ago

Git merge commits should be avoided for non-master branches. Please use rebase instead. There is a ton of documentation on git rebase on the net.

schnecki commented 4 years ago

Git merge commits should be avoided for non-master branches. Please use rebase instead. There is a ton of documentation on git rebase on the net.

Ah I didn't know that. Thanks for the pointer

erikd commented 4 years ago

I would really like to see some of this stuff merged to master, but the code on this branch and in this PR is moving further and further away from master making it increasingly difficult to begin making sense of what has changed.

For instance the grenade.cabal file on this branch is generated from hpack making it difficult to even compare version bounds.

Obviously @HuwCampbell has the final say on this, but I cannot see any possibility of this branch being merged to master as it currently is. For me to merge stuff to master I would like to see a collection of small PRs that only change a few things at a time, so I can properly review the changes. As if is, even extracting small changes from the commits in this PR is a huge and difficult task.

alexanderkjeldaas commented 4 years ago

The main problem with this PR is that it isn't reviewed, or that it takes months to review.

I looked through it, and this isn't a large change IMO. There are 4-5 conceptual changes, and not a single reorganization of how the code works.

If someone can just review it, I'm willing to do the work to break it up into smaller PRs.

erikd commented 4 years ago

IMO in is not reviewable in its current state. There are typo fixes mixed in with semantically NULL whitespace changes, mixed in with real changes, sometimes in the same commit.

If someone breaks it into smaller chunks, I will review the smaller chunks :D.

alexanderkjeldaas commented 4 years ago

I don't understand why whitespace changes or typo fixes makes it difficult to review code. I understand that the code is not in a mergeable state, but I don't understand why you cannot just ignore obvious whitespace, formatting, or typo fixes when reviewing.

erikd commented 4 years ago

I don't understand why whitespace changes or typo fixes makes it difficult to review code.

Semanically null whitespace changes make the diff bigger than it would be otherwise. Without whitespace changes, I have much fewer changes to look at, and everything I look at is relevant. The other problem is that whitespace changes are purely a matter of opinion and entirely subjective. This can leave one commit changing whitespace from state A to state B, and a second commit changing it from B back to A, thereby reducing the value of revision control history.

OTOH, typo fixes are relevant.

schnecki commented 4 years ago

The main problem I see, is backward compatibility, which I decided to ignore on my branch for now, cause the dependent type libraries are changing to fast. So I guess I just close this pull request.

erikd commented 4 years ago

Is if ok if we keep this open just so I can keep a track of it more easily?

erikd commented 4 years ago

@schnecki I would like to bring anything of value from this branch to master. What do you think are the top tree improvements on this branch? I will see if they can be brought across.

schnecki commented 4 years ago

@erikd Well the most important one is definitely the optimizer support with Adam implemented. But that's a huge change. Then the runtime creation of networks, but I plan to implement a better interface for that first, cause rn it is easy to mess it up. And a wrong network leads to runtime errors as the matrices don't match (so I think you should wait with that). The easiest is probably the Dropout layer, which still is not implemented in this repo (which I figured on Friday actually). The Float/Double switching can be neglected I think, especially as for big matrices Floats are way off, resulting in travis to fail.

Add specification: Currently one has to specify the input dimensions (and number of nodes):

netSpec :: SpecNet
netSpec = specFullyConnected 2 40 |=> specTanh1D 40 |=>  specNil1D 40

If you get it wrong it will build it, but for sure fails when running the network. So you kinda loose all what the types give you in Grenade, but you can easily build runtime networks if you get it right. I especially need this for serializing the network with its architecture to a DB to later be able to load it. I plan to build a (stateful) wrapper to be able to specify networks similar like in a library I built for the tensorflow bindings (probably with the activation functions not combined with the previous layer tho):

buildModel $ 
inputLayer1D lenIn >>
fullyConnected [10 * lenIn] relu' >>
fullyConnected [lenActs, cols] relu' >>
fullyConnectedLinear [lenActs, cols] >>
trainingByAdamWith (OptAdam 0.001 0.9 0.999 1e-8)

Btw why I investigated tensorflow after using Grenade at all was i) Adam and ii) saving/restoring a network. I think these are quite important features to have.

schnecki commented 4 years ago

Well actually, what I completely forgot are the GNum instances. I think these are important to be able to process minibatches, as you have to average over the gradients [1]. That's not a huge change, as it's just adding some instances in each layer.

[1] http://www.holehouse.org/mlclass/17_Large_Scale_Machine_Learning.html

schnecki commented 4 years ago

I implemented the interface and refactored a little. I think I will now mainly concentrate on the projects depending on Grenade again :)