MTG / DeepConvSep

Deep Convolutional Neural Networks for Musical Source Separation
GNU Affero General Public License v3.0
471 stars 110 forks source link

TypeError when running trainCNN, please help #1

Closed ghost closed 7 years ago

ghost commented 7 years ago

OK, I have windows 7 ultimate 64-bit with service pack 1 installed. i have visual studio 2013 community with update 5 installed i have every req you guys put in your readme (even if you didn't specify exact version of each req, but i assumed at least theano 0.8.2 and lasagne 0.2dev1), numpy, scipy, climate, etc are all standard installs.

my theano installation works, i tried it by itself, so the problem is not there, also, the compiler nvcc does work, so everything is linked and working.

in terms of your framework here, i can manage to separate a mixture using pre-trained pkl you guys provided, without any errors. i can also manage to do the compute_features option of dsd100 (im using dsd100subset 120mb package, not the full dsd100).

the compute_features generates a warning about non-data chunks in the wav files, so not sure how wav files were generated, but anyway, just saying this in case this turns out to be a problem), but it works, i get .data and .shape files in the transform folder.

however, only thing i can't get to work, is the dsd100 trainCNN. i get the following error: Using gpu device 0: GeForce GTX 770 (CNMeM is enabled with initial size: 70.0% o f memory, cuDNN 5005) I 2017-02-21 21:49:32 trainer:433 Maximum: 0.634328 I 2017-02-21 21:49:32 trainer:434 Mean: 0.003356 I 2017-02-21 21:49:32 trainer:435 Standard dev: 0.013143 I 2017-02-21 21:49:32 trainer:163 Building Autoencoder Traceback (most recent call last): File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "C:\Python27\lib\runpy.py", line 72, in _run_code exec code in run_globals File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 444, in train_errs=train_auto(train=ld1,fun=buildca,transform=tt,outdir=db+'output/ '+model+"/",testdir=db+'Mixtures/',model=db+"models/"+"model"+model+".pkl",num_ epochs=nepochs,scale_factor=scale_factor) File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 173, in train_auto network2 = fun(input_var=input_var2,batch_size=train.batch_size,time_context =train.time_context,feat_size=train.input_size) File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 93, in build_ca l_conv1 = lasagne.layers.Conv2DLayer(l_in_1, num_filters=50, filter_size=(1, feat_size),stride=(1,1), pad='valid', nonlinearity=None) File "C:\Python27\lib\site-packages\lasagne\layers\conv.py", line 599, in in it **kwargs) File "C:\Python27\lib\site-packages\lasagne\layers\conv.py", line 282, in in it self.filter_size = as_tuple(filter_size, n, int) File "C:\Python27\lib\site-packages\lasagne\utils.py", line 196, in as_tuple "of {0}, got {1} instead".format(t.name, x)) TypeError: expected a single value or an iterable of int, got (1, 513L) instead

C:\DeepConvSep>

I am really not sure what that means, seems to be either a problem in your code or something else on my end, but what could it be? thanks a lot for amazing source code guys, i hope you can help me with my problem :)

nkundiushuti commented 7 years ago

I suspect is something related to the data generated. Could you check a shape file in the /transform/t1 and see what numbers you got there (it should be # 5 something 513 ?

also at the line 435 in trainCNN.py could you write print ld1.input_size inputs, target = ld1() print inputs.shape print target.shape and run in and tell me what it is printing

ghost commented 7 years ago

055 - Angels In Amplifiers - I'm Alright_0_m.shape

5 2586 513

055 - Angels In Amplifiers - I'm Alright_1_m.shape

5 1212 513

081 - Patrick Talbot - Set Me Free_0_m.shape

5 2586 513

081 - Patrick Talbot - Set Me Free_1_m.shape

5 1040 513

I am sorry, I am not expert in python code implementation, I know basic stuff, but im not sure I can put those print lines correctly, they give me indentation errors, so not sure how to exactly put them in :(

I hope you can help me out.

nkundiushuti commented 7 years ago

shape file looks ok what you can do is to replace feat_size with 513 in the line 198 in trainCNN.py

ghost commented 7 years ago

line 198? alpha_component = alpha*lasagne.objectives.squared_error(vocals,target_var2[:,1:2,:,:])

nkundiushuti commented 7 years ago

sorry, line 93 l_conv1 = lasagne.layers.Conv2DLayer(l_in_1, num_filters=50, filter_size=(1, feat_size),stride=(1,1), pad='valid', nonlinearity=None) needs to be l_conv1 = lasagne.layers.Conv2DLayer(l_in_1, num_filters=50, filter_size=(1, 513),stride=(1,1), pad='valid', nonlinearity=None)

ghost commented 7 years ago

Nice, it seems I managed to get further now, but have different new error:

I 2017-02-22 04:00:18 trainer:433 Maximum: 0.634328 I 2017-02-22 04:00:18 trainer:434 Mean: 0.003356 I 2017-02-22 04:00:18 trainer:435 Standard dev: 0.013143 I 2017-02-22 04:00:18 trainer:163 Building Autoencoder Traceback (most recent call last): File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "C:\Python27\lib\runpy.py", line 72, in _run_code exec code in run_globals File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 444, in train_errs=train_auto(train=ld1,fun=buildca,transform=tt,outdir=db+'output/ '+model+"/",testdir=db+'Mixtures/',model=db+"models/"+"model"+model+".pkl",num_ epochs=nepochs,scale_factor=scale_factor) File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 173, in train_auto network2 = fun(input_var=input_var2,batch_size=train.batch_size,time_context =train.time_context,feat_size=train.input_size) File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 105, in build_ca l_reshape1 = lasagne.layers.ReshapeLayer(l_fc11,(batch_size,l_conv2.output_s hape[1],l_conv2.output_shape[2], l_conv2.output_shape[3])) File "C:\Python27\lib\site-packages\lasagne\layers\shape.py", line 121, in i nit raise ValueError("shape must be a tuple of int and/or [int]") ValueError: shape must be a tuple of int and/or [int]

so what do I do here? thanks!

nkundiushuti commented 7 years ago

that's weird. I would replace batch_size with 32 or write int(batch_size) if you still get the error, then maybe you need to install lasagne the latest version from github

ghost commented 7 years ago

im pretty sure I have the latest Lasagne version from github right now, but i will check later also.

for now, you say " I would replace batch_size with 32 or write int(batch_size)" can you tell me exactly what lines i need to change this in?

thanks!

nkundiushuti commented 7 years ago

you need to see where you have the error and go to that line. for instance for the error you mentioned above: File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 105, in build_ca l_reshape1 = lasagne.layers.ReshapeLayer(l_fc11,(batch_size,l_conv2.output_s hape[1],l_conv2.output_shape[2], l_conv2.output_shape[3])) so line 105

ghost commented 7 years ago

Same error, tried both with 32 and int(batch_size) at line 105.

Before I do any upgrades or downgrades to my python packages, can you tell me the exact version numbers of your Theano, Lasagne and maybe other packages related too?

Perhaps, I need another version. If not, I really don't understand what is wrong at line 105.

thanks!

nkundiushuti commented 7 years ago

I tested everything with the latest version of lasagne from github here's what you can do. you can download lasagne from github and copy the lasagne folder(this one containing the library: https://github.com/Lasagne/Lasagne/tree/master/lasagne) in the deepconvsep folder

ghost commented 7 years ago

I am not sure if I did it correctly, but I don't see any difference (or reason to do that?). I downloaded the repo, took only the lasagne folder from it and pasted it in the DeepConvSep folder and did nothing more, just like you said. Did the trainCNN again and same error as before at line 105.

I also updated my theano to the dev version, scipy to 0.18.1 and still same error. However, theano dev 0.90rc1 says libgpuarray back end will be needed eventually. Not sure if that is going to help at all if I install. Some of these libraries are so complicated to install and follow instructions.

Aside from that, do you have any other suggestions? I will try to see if I can fix the problem by myself somehow, but I was very excited when you fixed my first error, but now this error is even worse it seems.

thanks!

nkundiushuti commented 7 years ago

I will try to replicate the error on a windows computer using a similar setup. I couldn't get this errors on ubuntu and mac os systems. it would be useful if you could print some output like print l_conv2.output_shape before the line that is has errors

ghost commented 7 years ago

I wish I was better at Python so I can help here, I know the print function, just not sure how exactly to insert it into the code. Can you show me for example where to put it between what lines and how exactly? You mentioned different operating systems, but that would be just too weird if that was the problem, then again, who knows.

I really hope someone can help me on this, I am very interested in this project, but I would need to make it work before I do anything else with it. I hope you can manage to reproduce the error on your side and see what causes it. Thanks for your time and help!

ghost commented 7 years ago

This is probably not something important, I am no expert in neural nets or lasagne/theano, but I noticed something interesting from the lasagne documentation when it comes to the ReshapeLayer:

http://lasagne.readthedocs.io/en/latest/modules/layers/shape.html

especially this part:

l_in = InputLayer((32, 100, 20)) l1 = ReshapeLayer(l_in, ((32, 50, 40)))

If I compare that example to the code in trainCNN, I noticed that the input layer has a multiplication while the reshape layer has just elements of that multiplication: l_fc11=lasagne.layers.DenseLayer(l_fc,l_conv2.output_shape[1]l_conv2.output_shape[2]l_conv2.output_shape[3]) l_reshape1 = lasagne.layers.ReshapeLayer(l_fc11,(batch_size,l_conv2.output_shape[1],l_conv2.output_shape[2], l_conv2.output_shape[3]))

Is that something maybe that I have to change to make it work or is that ok?

ghost commented 7 years ago

OK, so I finally managed to do the printing function correctly now. Going back at what you said: print ld1.input_size inputs, target = ld1() print inputs.shape print target.shape

I managed to get this: 513 (32L, 30L, 513L) (32L, 30L, 2052L)

Also, that was what you suggested before the ValueError. For ValueError, I am getting help from another user on Lasagne google group page and he told me to print right before the ReshapeLayer using this line: print((batch_size,l_conv2.output_shape[1],l_conv2.output_shape[2], l_conv2.output_shape[3]))

and I get this result from the printing: (32, 50, 16, 1L)

So what is wrong? is it because the last entry is 1L? I hope you can help also, thanks! :)

ghost commented 7 years ago

In order to maybe help solving this, this is the topic I created on the Lasagne Google Group, so maybe you can figure out some information from that: https://groups.google.com/forum/#!topic/lasagne-users/cpVh3Nd4RJA

thanks!

nkundiushuti commented 7 years ago

hey! thank you for working on this.

reading the message board it looks like it's a type conversion error which arrises in 32bit windows. as Jan says on the message board you should wrap all the layer sizes in int() including the input_shape input_shape=(int(batch_size),1,int(time_context),int(feat_size))

ghost commented 7 years ago

OMG!!! It finally works! Thanks a lot! However, I still get an error:

I 2017-02-25 05:10:56 trainer:290 Separating Traceback (most recent call last): File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "C:\Python27\lib\runpy.py", line 72, in _run_code exec code in run_globals File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 449, in train_errs=train_auto(train=ld1,fun=buildca,transform=tt,outdir=db+'output/ '+model+"/",testdir=db+'Mixtures/',model=db+"models/"+"model"+model+".pkl",num_ epochs=nepochs,scale_factor=scale_factor) File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 292, in train_auto dev_directory = os.listdir(os.path.join(testdir,"Dev")) WindowsError: [Error 3] The system cannot find the path specified: 'DSD100Mixtur es/Dev/.'

It seems to be separating something from a folder I don't have. However, I don't want to separate, unless this is part of something for the pkl? Because I already have the pkl generated before this separation step. I tried separation with the included dsd100 separation code and the generated model and everything is working! I am so happy. So, is the separation within trainCNN needed? How to skip it?

You mentioned windows 32-bit? But I have 64-bit, but anyway, doesn't matter, as long as it works now!

Finally, my question is, why is the pkl so small 1.8MB? Is this normal? I do have only 2 songs I used for the training, but is that the reason or it's going to be always small?

Thanks a lot!

nkundiushuti commented 7 years ago

pkl depends on the numbers of parameters in your network. if you want to skip separation, you have a skip_sep parameter which you have to set to True when train_auto is called

thanks for using this :)