ethancaballero / Improved-Dynamic-Memory-Networks-DMN-plus

Theano Implementation of DMN+ (Improved Dynamic Memory Networks) from the paper by Xiong, Merity, & Socher at MetaMind, http://arxiv.org/abs/1603.01417 (Dynamic Memory Networks for Visual and Textual Question Answering)
168 stars 63 forks source link

Lasagne issue while running on GPU #4

Open sa-j opened 7 years ago

sa-j commented 7 years ago

After following your instructions and installing the prerequisite for running DMN+, I get the following error:

` (keras-dmn)user1@dpl04:~/keras/Improved-Dynamic-Memory-Networks-DMN-plus$ python main.py --network dmn_tied --mode train --babi_id 1

Using gpu device 2: GeForce GTX TITAN X (CNMeM is enabled with initial size: 98.0% of memory, CuDNN not available) ==> parsing input arguments ==> Loading test from /home/IAIS/user1/keras/Improved-Dynamic-Memory-Networks-DMN-plus/data/tasks_1-20_v1-2/en-10k/qa1_single-supporting-fact_train.txt ==> Loading test from /home/IAIS/user1/keras/Improved-Dynamic-Memory-Networks-DMN-plus/data/tasks_1-20_v1-2/en-10k/qa1_single-supporting-fact_test.txt ==> not using minibatch training in this mode ==> not used params in DMN class: ['shuffle', 'network', 'babi_id', 'batch_size', 'epochs', 'prefix', 'load_state', 'log_every', 'babi_test_id', 'save_every'] ==> building input module ==> creating parameters for memory module ==> building episodic memory module (fixed number of steps: 3) ==> building answer module ==> collecting all parameters ==> building loss layer and computing updates Traceback (most recent call last): File "main.py", line 194, in args, network_name, dmn = dmn_mid(args) File "main.py", line 84, in dmn_mid dmn = dmn_tied.DMN_tied(**args_dict) File "/home/IAIS/user1/keras/Improved-Dynamic-Memory-Networks-DMN-plus/dmn_tied.py", line 225, in init updates = lasagne.updates.adam(self.loss, self.params) File "/home/IAIS/user1/anaconda2/envs/keras-dmn/lib/python2.7/site-packages/lasagne/updates.py", line 583, in adam all_grads = get_or_compute_grads(loss_or_grads, params) File "/home/IAIS/user1/anaconda2/envs/keras-dmn/lib/python2.7/site-packages/lasagne/updates.py", line 114, in get_or_compute_grads raise ValueError("params must contain shared variables only. If it " ValueError: params must contain shared variables only. If it contains arbitrary parameter expressions, then lasagne.utils.collect_shared_vars() may help you. `

I used your theanorc file with adjusting the CUDA root. Thanks!

Vimos commented 6 years ago

I am also getting this error and I am using

In [1]: import theano
the Using cuDNN version 7103 on context None
Mapped name None to device cuda: GeForce GTX 1080 Ti (0000:01:00.0)
In [2]: theano.__version__
Out[2]: u'1.0.1'
In [3]: import lasagne
In [4]: lasagne.__version__
Out[4]: '0.2.dev1'

The error origins from

    if any(not isinstance(p, theano.compile.SharedVariable) for p in params):
        raise ValueError("params must contain shared variables only. If it "
                         "contains arbitrary parameter expressions, then "
                         "lasagne.utils.collect_shared_vars() may help you.")

I am trying to learn more about this problem.

After some hacking, the code can run now:

➜  Improved-Dynamic-Memory-Networks-DMN-plus git:(master) ✗ cat tmp       
diff --git a/dmn_tied.py b/dmn_tied.py
index a3241b4..848e436 100644
--- a/dmn_tied.py
+++ b/dmn_tied.py
@@ -220,10 +220,10 @@ class DMN_tied:
             self.loss_l2 = 0

         self.loss = self.loss_ce + self.loss_l2
-        
+
         #updates = lasagne.updates.adadelta(self.loss, self.params)
-        updates = lasagne.updates.adam(self.loss, self.params)
-        updates = lasagne.updates.adam(self.loss, self.params, learning_rate=0.0001, beta1=0.5) #from DCGAN paper
+        # updates = lasagne.updates.adam(self.loss, self.params)
+        updates = lasagne.updates.adam(self.loss, lasagne.utils.collect_shared_vars(self.params), learning_rate=0.0001, beta1=0.5) #from DCGAN paper
         #updates = lasagne.updates.adadelta(self.loss, self.params, learning_rate=0.0005)
         #updates = lasagne.updates.momentum(self.loss, self.params, learning_rate=0.0003)

@@ -439,7 +439,7 @@ class DMN_tied:
         with open(file_name, 'w') as save_file:
             pickle.dump(
                 obj = {
-                    'params' : [x.get_value() for x in self.params],
+                    'params' : [x.get_value() for x in lasagne.utils.collect_shared_vars(self.params)],
                     'epoch' : epoch, 
                     'gradient_value': (kwargs['gradient_value'] if 'gradient_value' in kwargs else 0)
                 },
@@ -629,7 +629,7 @@ class DMN_tied:
         input_mask = input_masks[batch_index]

         ret = theano_fn(inp, q, ans, input_mask)
-        param_norm = np.max([utils.get_norm(x.get_value()) for x in self.params])
+        param_norm = np.max([utils.get_norm(x.get_value()) for x in lasagne.utils.collect_shared_vars(self.params)])

         return {"prediction": np.array([ret[0]]),
                 "answers": np.array([ans]),

I am not sure if the modification is right, I will wait to see the result!