Closed alishakiba closed 7 years ago
Hi alishakiba,
I'll try to help you more tomorrow but I can already quickly say you should try loading and setting the trained parameters as follows:
loaded_params = pickle.load(open('params.pkl', 'rb'))
all_params = nn.layers.get_all_params(l_out)
for i, v in enumerate(loaded_params):
all_params[i].set_value(v)
Let me know if that already helps.
Jeffrey
Hi Jeffrey,
Thanks for the help. This command does not work, however I was able to load the parameters by
model_data = pickle.load(open(dump_path, 'r'))
from models import basic_model as model
LEARNING_RATE_SCHEDULE = model.LEARNING_RATE_SCHEDULE
prefix_train = model.prefix_train if hasattr(model, 'prefix_train') else \
'/run/shm/train_ds2_crop/'
prefix_test = model.prefix_test if hasattr(model, 'prefix_test') else \
'/run/shm/test_ds2_crop/'
SEED = model.SEED if hasattr(model, 'SEED') else 11111
id_train, y_train = model.id_train, model.y_train
id_valid, y_valid = model.id_valid, model.y_valid
id_train_oversample = model.id_train_oversample,
labels_train_oversample = model.labels_train_oversample
sample_coefs = model.sample_coefs if hasattr(model, 'sample_coefs') \
else [0, 7, 3, 22, 25]
l_out, l_ins = model.build_model()
nn.layers.set_all_param_values(l_out, model_data)
I was also able to load the data in the shared memory, using the following approach, but could you please explain the exact input of the network?
chunk_size = 64
batch_size = 128
output = nn.layers.get_output(l_out, deterministic=True)
input_ndims = [len(nn.layers.get_output_shape(l_in))
for l_in in l_ins]
xs_shared = [nn.utils.shared_empty(dim=ndim)
for ndim in input_ndims]
import pandas as pd
OriginalLabels = pd.read_csv(r'../data/trainLabels.csv', sep=',')
import glob
temp1 = np.zeros((chunk_size, 3, 512, 512), dtype='float64')
fileList =
sorted(glob.glob(r'/home/ali/Desktop/kaggle_diabetic_retinopathy-master/data/64/*.tiff'))
labels64 = []
for i, f in enumerate(fileList):
temp1[i,:,:,:] = np.array(Image.open(f)).T / 255.0
fname = f.split('/')[-1].split('.')[0]
lbl = OriginalLabels.loc[OriginalLabels['image'] ==
fname]['level'].values.item(0)
# print lbl
labels64.append(lbl)
xs_shared[0].set_value(temp1)
print temp1.shape
temp2 = np.ones((chunk_size, 2), dtype='float64') * 512
xs_shared[1].set_value(temp2)
temp2.shape
idx = T.lscalar('idx')
givens = {}
for l_in, x_shared in zip(l_ins, xs_shared):
givens[l_in.input_var] = x_shared[idx * batch_size:(idx + 1) *
batch_size]
compute_output = theano.function(
[idx],
output,
givens=givens,
on_unused_input='ignore'
)
print 'Done'
%time predictions = compute_output(0)
However, I get an error of summing two pictures, I think that's when the two pictures are merged.
Thanks again.
Until I have more time, could you maybe try merging your changes with my original notebook? I.e., such that it uses the CPU but everything else should still work. You will need to change some things about how it "finds" the images to iterate over etc. But that way it will be easier for me to help you quickly.
I'll try to take a closer look tomorrow.
Thanks again. I'll take care of that within a couple of hours.
Dear Jeffrey
I have written my own code which I think it should work for single CPU. The code is places in (https://github.com/alishakiba/kaggle_diabetic_retinopathy/blob/master/notebooks/PredictDRD.ipynb). However, there is a problem on block 16, it takes hours of hours of CPU and I have not seen it finishing its job.
To be able to create the model, I was forced to modify some lines in the basic_model.py
file which are marked in (https://github.com/alishakiba/kaggle_diabetic_retinopathy/commit/e7750f23a1d8d052ce1c897e620e46aaf5b1a3f6).
I have also tested your code on a GPU of GeForce GT 730
, however, I was unsuccessful because of no support for cudnn
. The same notebook as in (https://github.com/alishakiba/kaggle_diabetic_retinopathy/commit/e7750f23a1d8d052ce1c897e620e46aaf5b1a3f6) with the same code on a GPU enable machine hangs on block 8, with dimension mismatch (where I am setting the weights to the model!)
ValueError: mismatch: parameter has shape (32, 128, 128) but the value to set has shape (32, 127 ,127)
I'll just try to make a quick notebook to do it on the CPU, it'll be the easiest, I think. I'm working on it now and will try to finish it today. Otherwise, please remind me if I don't update it before the end of the week.
Thanks Jeffrey. Besides, I am learning the neural networks (I've read the Nielsens' book and some other stuff around the web). Which model of GPU do you suggest to use? (I have currently a GT 430 and a GT 730 graphic cards, but none of them supports to run cudnn
).
By the way, the error you are getting is most likely because of this change you made.
Same convolution support for the normal conv layers was probably added later (and you are probably using an older version of Lasagne).
Thanks Jeffrey. Besides, I am learning the neural networks (I've read the Nielsens' book and some other stuff around the web). Which model of GPU do you suggest to use? (I have currently a GT 430 and a GT 730 graphic cards, but none of them supports to run
cudnn
).
It depends on your budget. The GTX 980 (normal or Ti) is very good for the money (about 350 pounds for the non-Ti version). In any case, try to get a Maxwell GPU since they are the most recent ones and are generally much faster than the older generation. But I can't confidently say much about the less expensive ones (I have heard about the 970 having some issues when you try to use a lot of video memory).
A very good alternative is to use the GPU instances on Amazon AWS. It is roughly 2-3 times slower than a 980 but still pretty good (and supports cudnn).
By the way, the error you are getting is most likely because of this change you made.
Same convolution support for the normal conv layers was probably later (and you are probably using an older version of Lasagne).
Thanks. I have updated my lasagne
to the latest version and uncommented the line. The NotImplemented
error has gone but there is another :(((
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-598fada8a14d> in <module>()
11 sample_coefs = model.sample_coefs if hasattr(model, 'sample_coefs') else [0, 7, 3, 22, 25]
12
---> 13 l_out, l_ins = model.build_model()
/home/ali/Desktop/DRD/fifth/models/basic_model.py in build_model()
143 nonlinearity=LeakyRectify(leakiness),
144 W=nn.init.Orthogonal(1.0), b=nn.init.Constant(0.1),
--> 145 untie_biases=True)
146 layers.append(l_conv)
147
/opt/anaconda/lib/python2.7/site-packages/Lasagne-0.2.dev1-py2.7.egg/lasagne/layers/conv.pyc in __init__(self, incoming, num_filters, filter_size, stride, pad, untie_biases, W, b, nonlinearity, convolution, **kwargs)
389 nonlinearity=nonlinearities.rectify,
390 convolution=T.nnet.conv2d, **kwargs):
--> 391 super(Conv2DLayer, self).__init__(incoming, **kwargs)
392 if nonlinearity is None:
393 self.nonlinearity = nonlinearities.identity
TypeError: __init__() got an unexpected keyword argument 'border_mode'
I am using the latest version of Theano
and Lasagne
.
It depends on your budget. The GTX 980 (normal or Ti) is very good for the money (about 350 pounds for the non-Ti version). In any case, try to get a Maxwell GPU since they are the most recent ones and are generally much faster than the older generation. But I can't confidently say much about the less expensive ones (I have heard about the 970 having some issues when you try to use a lot of video memory).
A very good alternative is to use the GPU instances on Amazon AWS. It is roughly 2-3 times slower than a 980 but still pretty good (and supports cudnn).
Thank you. I will try to ask for a GTX 980
one :))).
Yes, the border_mode argument is gone. You need to replace it with
pad='same'
I'm trying this now but the compilation time is taking ages or something went wrong. I'll check later.
It is stuck on compiling the compute_output function for me (when using the normal non-cudnn Conv2DLayers). I'll try to have another look this weekend. Please let me know if you got it to work in the meantime.
That's the same for me. I've run it for a night, from 11:30pm to 06:00am and it was not finished :((. Would you mind explain a little bit about the input to the network? Maybe I can then implement the network with a GPU package which won't use cudnn
.
It might be some bug with Theano. Compiling the graph for the first time can take a while (like 10 minutes, maybe), but it shouldn't take much longer. I've tested it with the cudnn layers and they work fine.
Which input do you mean? The input images to the network as provided by the generators?
The generators can do a lot of transformations during training, see here. You can remove all that if you just want to test or get rid of all the extra code this brings with it. Just do the resizing and normalising to get somewhat decent results.
If you plan on using some other package to try it, keep in mind that parameters from my dump might be saved in another "format" than the one another package uses.
Hi there,
I have modified the code, so it can run on a single CPU.
In file basic_model.py, comment line 7
and change lines 136-137 as follows
Then, the following code is working without any errors, however, produces a set non-related and random weights! Could you please help to load the weights?