Closed lilinghai closed 9 years ago
The version of Lasagne you are using is probably too new :) This code was written when Lasagne was still under heavy development, and it has not been updated since. MultipleInputsLayer
has been renamed to MergeLayer
, for example.
The easiest solution is just to install commit f445b71 (probably best to do it in a virtualenv or so), the code is known to work with that version. Instructions on how to do this are in the documentation: https://github.com/benanne/kaggle-ndsb/blob/master/doc.pdf (Section 5.2, software dependencies).
Hi,
I am working for the OSU plankton lab and ran into the same error with an updated version of Lasagne, but the commit you've linked to (f445b71) seems to either be broken or non-existent.
When trying the command from the documentation, I received the following output:
$ pip install git+git://github.com/benanne/Lasagne.git@f445b71 Downloading/unpacking git+git://github.com/benanne/Lasagne.git@f445b71 Cloning git://github.com/benanne/Lasagne.git (to f445b71) to /tmp/pip-Z5sdvQ-build Could not find a tag or branch 'f445b71', assuming commit.
Do you know if the link is broken or possibly a typo? Any help getting the correct version would be greatly appreciated.
Thanks, Miles
I don't see any problems with the output you pasted. It says "assuming commit" which is what it's supposed to do. What is the exact issue you're facing?
The command is able to download Lasagne successfully, but doesn't seem to find the specific version mentioned. While trying to generate the solution by running the setup python files, "train_convnet.py" throws errors regarding references to Lasagne in the file. For example, 'MultipleInputsLayer' being renamed 'MergeLayer' in later versions.
Here is the current output of "train_convnet.py" for reference:
Traceback (most recent call last):
File "train_convnet.py", line 44, in
The errors appear to be related to the version. Have you seen these before?
Yeah, from those errors it looks like it's installing a more recent version anyway. It's weird because the output you posted before says "assuming commit", so it actually looks like it's doing what it's supposed to do. Are you sure you don't have multiple instances of Lasagne installed? Maybe it's importing a different copy?
Thanks for the help! After a somewhat fresh installation of the dependencies, Theano and Lasagne seem to be working and we've moved past this error.
One more quick question:
After creating the data files and validation splits, I got stopped while executing the train_convnet.py script with an error stating Lasagne couldn't find cudnn (output below).
miles@CGRB-Desktop:~/Downloads/kaggle-ndsb-master$ python train_convnet.py convroll4 using default validation split: validation_split_v1.pkl
Experiment ID: convroll4-CGRB-Desktop-20150813-141142
Build model
number of parameters: 5475049
layer output shapes:
DenseLayer (32, 121)
DropoutLayer (32, 256)
CyclicPoolLayer (32, 256)
DenseLayer (128, 256)
DropoutLayer (128, 1024)
CyclicRollLayer (128, 1024)
DenseLayer (128, 256)
DropoutLayer (128, 12800)
FlattenLayer (128, 12800)
CyclicConvRollLayer (128, 512, 5, 5)
MaxPool2DDNNLayer (128, 128, 5, 5)
Conv2DDNNLayer (128, 128, 11, 11)
Conv2DDNNLayer (128, 256, 11, 11)
Conv2DDNNLayer (128, 256, 11, 11)
CyclicConvRollLayer (128, 256, 11, 11)
MaxPool2DDNNLayer (128, 64, 11, 11)
Conv2DDNNLayer (128, 64, 23, 23)
Conv2DDNNLayer (128, 128, 23, 23)
Conv2DDNNLayer (128, 128, 23, 23)
CyclicConvRollLayer (128, 128, 23, 23)
MaxPool2DDNNLayer (128, 32, 23, 23)
Conv2DDNNLayer (128, 32, 47, 47)
Conv2DDNNLayer (128, 64, 47, 47)
CyclicConvRollLayer (128, 64, 47, 47)
MaxPool2DDNNLayer (128, 16, 47, 47)
Conv2DDNNLayer (128, 16, 95, 95)
Conv2DDNNLayer (128, 32, 95, 95)
CyclicSliceLayer (128, 1, 95, 95)
InputLayer (32, 1, 95, 95)
Traceback (most recent call last):
File "train_convnet.py", line 70, in
I followed this tutorial to install cudnn as it seamed simpler and more streamlined than other suggestions. While I tried all three options in the guide, none seemed to change the errors.
http://deeplearning.net/software/theano/library/sandbox/cuda/dnn.html
I also noticed several guides in which cudnn is built around or inside of Caffe. https://github.com/tiangolo/caffe/blob/ubuntu-tutorial-b/docs/install_apt2.md https://github.com/BVLC/caffe/wiki/Install-Caffe-on-EC2-from-scratch-%28Ubuntu,-CUDA-7,-cuDNN%29
The Deep Sea documentation doesn't mention anything about this, but is this an error you have come across regarding cudnn? If simply adding the cudnn files into the CUDA directories should work I will continue to mess with it, otherwise I will take the Caffe route.
Thanks again.
These two notes from the Theano docs may be relevant:
So make sure you're not doing either of those :) There's also a thread on the mailing list that discusses the installation for Theano specifically: https://groups.google.com/forum/#!topic/theano-users/TKzDReD5v5I
Hello again!
Thank you very much for the advice for the last few issues. Since then we have successfully trained all of the basic models except one: convroll_all_broaden_7x7_weightdecay_resume.
After train_convnet.py outputs the build model, we receive the following error:
Load model parameters for resuming
Traceback (most recent call last):
File "train_convnet.py", line 114, in
The error does note that the file doesn't exist so I tried again after manually creating the file in metadata (multiple reasons this could go wrong but worth a shot) and received this instead:
Load model parameters for resuming
Traceback (most recent call last):
File "train_convnet.py", line 114, in
This seems like an odd place to get stuck as the rest of the models appeared to run similarly and completed just fine. I had little luck searching for the same errors with Python... Is this something you've come across before? Thanks again in advance.
This config was resumed from a crashed training run (hence the _resume
). So it needs to load up the most recent data file from this training run, which has an exact timestamp. You need to supply a valid pickle file with the model data/metadata for this to work. It looks like you are providing an invalid file.
But if you're lucky your training run will not have crashed to begin with, so then you don't need to both with resuming anyway.
Would this imply that the files associated with this specific module in the data and/or configurations folders became corrupt at some point during the training? As I understand the scripts so far, the .pkl files in metadata are generated by train_convnet.py. If I need a data file with a valid time stamp would it make sense to pull down the source again and try again?
no, I think there's a misunderstanding. The _resume
configs are configs we created when a specific training run crashed, and we wanted to continue from the last checkpoint. There is no reason for you to reproduce this exactly if the original training run doesn't crash (which I hope it doesn't). In short, you should not be using the _resume
configs at all. The metadata files they reference are not included in the repository anyway.
Ah that makes much more sense, and would explain quite a bit. I'm ruuning out of time today, but that should fix our issues curently. I'll post an update when it's all running.
Thanks again for all of your help.
Hi,
I am trying to run your code in a virtualenv.
I get this: Experiment ID: convroll4-sunrise-20150916-083108
Build model
Traceback (most recent call last):
File "train_convnet.py", line 44, in
Any ideas what causes the error?
Could it be thet I have Lasagne==0.1.dev0 (got installed with pip install git+git://github.com/benanne/Lasagne.git@f445b71)? Is it too late a version?
did you modify the config in any way? The version of Lasagne that you should install for this code to work unchanged is specified in the documentation, so you can check there. I don't know the commit hash by heart. Newer versions are unlikely to work without changes to the code.
Nope, the config is not changed at all. Everything is run in a virtualenv. The Lasagne version should also be OK.
From the error it sounds like an initializer is receiving a shape with a 0 in it as input, which means that one of the layers thinks that its weight matrix is empty. This is obviously wrong. Maybe you could check the output shapes / parameter shapes for all the layers?
How to solve it? E:\kaggle-ndsb>python train_convnet.py convroll4 Traceback (most recent call last): File "train_convnet.py", line 23, in
import nn_plankton
File "E:\kaggle-ndsb\nn_plankton.py", line 161, in
class BatchInterleaveLayer(nn.layers.MultipleInputsLayer):
AttributeError: 'module' object has no attribute 'MultipleInputsLayer'