IraKorshunova / folk-rnn

folk music modelling with LSTM
MIT License
341 stars 68 forks source link

Exception: You are creating a TensorVariable with float64 dtype. #6

Closed SeekPoint closed 7 years ago

SeekPoint commented 7 years ago

rzai@rzai00:~/prj/folk-rnn$ cat ~/.theanorc [global] floatX = float32 device = gpu warn_float64=ignore

[nvcc] fastmath = True

rzai@rzai00:~/prj/folk-rnn$ CUDA_VISIBLE_DEVICES=1 python train_rnn.py config5 data/allabcworepeats_parsed Using gpu device 0: GeForce GTX 1080 (CNMeM is disabled) float32 config5-allabcworepeats_parsed-20161123-204529 vocabulary size: 12535 train_rnn.py:68: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future valid_idxs = rng.choice(np.arange(ntunes), nvalid_tunes, replace=False) n tunes: 23636 n train tunes: 22484.0 n validation tunes: 1152.0 min, max length 54 2958 Building the model Traceback (most recent call last): File "train_rnn.py", line 127, in predictions = nn.layers.get_output(l_out) File "/usr/local/lib/python2.7/dist-packages/lasagne/layers/helper.py", line 185, in get_output all_outputs[layer] = layer.get_output_for(layer_inputs, **kwargs) File "/usr/local/lib/python2.7/dist-packages/lasagne/layers/recurrent.py", line 1050, in get_output_for strict=True)[0] File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan.py", line 792, in scan fake_nonseqs = [x.type() for x in non_seqs] File "/usr/local/lib/python2.7/dist-packages/theano/gof/type.py", line 323, in call return utils.add_tag_trace(self.make_variable(name)) File "/usr/local/lib/python2.7/dist-packages/theano/tensor/type.py", line 401, in make_variable return self.Variable(self, name=name) File "/usr/local/lib/python2.7/dist-packages/theano/tensor/var.py", line 716, in init raise Exception(msg) Exception: You are creating a TensorVariable with float64 dtype. You requested an action via the Theano flag warn_float64={ignore,warn,raise,pdb}. rzai@rzai00:~/prj/folk-rnn$

IraKorshunova commented 7 years ago

I cannot look into the code right now, but with warn_float64=ignore it should not raise this exception.

SeekPoint commented 7 years ago

rzai@rzai00:~/prj/folk-rnn$ cat ~/.theanorc [global] floatX = float32 device = gpu warn_float64=ignore

[nvcc] fastmath = True

I already set warn_float64=ignore

IraKorshunova commented 7 years ago

It's in the code: https://github.com/IraKorshunova/folk-rnn/blob/master/train_rnn.py#L17 I tried to run train_rnn and didn't have this error. My Theano==0.7.0 and Lasagne==0.2.dev1

SeekPoint commented 7 years ago

thanks,it works now

SeekPoint commented 7 years ago

I got the out -of-memory on GPU

I am using GTX1080 8G

what's GPU you used?

IraKorshunova commented 7 years ago

I had 12Gb. Try with theano flag 'allow_gc=True'

SeekPoint commented 7 years ago

Not work by set allow_gc=False

rzai@rzai00:~/prj/folk-rnn$ CUDA_VISIBLE_DEVICES=1 THEANO_FLAGS='allow_gc=False' python train_rnn.py config5 data/allabcworepeats_parsed Using gpu device 0: GeForce GTX 1080 (CNMeM is disabled) float32 config5-allabcworepeats_parsed-20161124-195010 vocabulary size: 12535 train_rnn.py:68: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future valid_idxs = rng.choice(np.arange(ntunes), nvalid_tunes, replace=False) n tunes: 23636 n train tunes: 22484.0 n validation tunes: 1152.0 min, max length 54 2958 Building the model /usr/local/lib/python2.7/dist-packages/theano/scan_module/scan.py:1019: Warning: In the strict mode, all neccessary shared variables must be passed as a part of non_sequences 'must be passed as a part of non_sequences', Warning) number of parameters: 194480456 layer output shapes: #params: output shape: InputLayer 0 (64, None) EmbeddingLayer 157126225 (64, None, 12535) InputLayer 0 (64, None) LSTMLayer 26723328 (64, None, 512) DropoutLayer 0 (64, None, 512) LSTMLayer 2100224 (64, None, 512) DropoutLayer 0 (64, None, 512) LSTMLayer 2100224 (64, None, 512) DropoutLayer 0 (64, None, 512) ReshapeLayer 0 (None, 512) DenseLayer 6430455 (None, 12535) Train model Error allocating 71696384 bytes of device memory (out of memory). Driver report 22216704 bytes free and 8507162624 bytes total Traceback (most recent call last): File "train_rnn.py", line 202, in train_loss = train(x_batch, mask_batch) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 618, in call storage_map=self.fn.storage_map) File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 297, in raise_with_op reraise(exc_type, exc_value, exc_trace) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 607, in call outputs = self.fn() MemoryError: Error allocating 71696384 bytes of device memory (out of memory). Apply node that caused the error: GpuAlloc{memset_0=True}(CudaNdarrayConstant{[[[ 0.]]]}, Elemwise{add,no_inplace}.0, Shape_i{0}.0, Shape_i{1}.0) Toposort index: 195 Inputs types: [CudaNdarrayType(float32, (True, True, True)), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar)] Inputs shapes: [(1, 1, 1), (), (), ()] Inputs strides: [(0, 0, 0), (), (), ()] Inputs values: [<CudaNdarray object at 0x7f7cf24a1c70>, array(547), array(64), array(512)] Outputs clients: [[GpuIncSubtensor{InplaceInc;int64::}(GpuAlloc{memset_0=True}.0, GpuElemwise{mul,no_inplace}.0, Constant{1})]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node. rzai@rzai00:~/prj/folk-rnn$

IraKorshunova commented 7 years ago

and when allow_gc=True?

SeekPoint commented 7 years ago

allow_gc=True is default, also tried, not works

IraKorshunova commented 7 years ago

maybe it's because you're creating float64. for me, it takes about 1Gb of GPU memory during training. if you don't find a solution, you can make the model or the batch size smaller

SeekPoint commented 7 years ago

ok, I will try .