cnn out of memory #2 - Githubissues

HenrikLovold commented 7 years ago

Hello,

I am attempting to train using the parser with some embeddings I have created. The number of words is 5393907, and I am running with 100 dimensions. The training oracle file consists of 110MB of text, and the dev oracle is at 17MB. The embeddings file size is 944MB. Running the parser without my pretrained embeddings works smooth.

The computer I am using has 378GB of RAM and 40 processing units. I am running with --cnn-mem 200000 (200GB) and still getting the cnn out of memory/core dumped error. What can I do in order to run the parser with my embeddings and treebank? I cannot see how 200GB of RAM can be insufficient. Any help would be greatly appreciated.

Kind regards, Henrik H. Løvold, MSc student LTG group at the University of Oslo

miguelballesteros commented 7 years ago

You can use --cnn-mem 2000 (which would use 2000 MB)

./lstm-parse --cnn-mem 2000 ...

ikekonglp commented 7 years ago

Another possibility is a subtle bug in your code when declare parameters. Check if the size of the parameters are valid (say you have a parameter size x * y, check if x and y have reasonable value when the code is running).

HenrikLovold commented 7 years ago

Thanks a lot for your quick replies. I am running the program as follows:

parser/lstm-parse --cnn-mem 2000 -T train01.txt -d dev01.txt --hidden_dim 64 --lstm_input_dim 32 -w sskip02.vec --pretrained_dim 100 --rel_dim 10 --action_dim 16 -t -P

I have trained the embeddings earlier today myself. They are 100 dimensional skip-gram embeddings generated by word2vec. @ikekonglp -- what do you mean by valid parameter sizes?

ikekonglp commented 7 years ago

For example, when you write a line like this: https://github.com/clab/cnn/blob/master/examples/nlm.cc#L35

Expression C = parameter(cg, model.add_parameters({DIM, DIM*CONTEXT}));

check the value of DIM and DIM*CONTEXT in that case. Because they might not be initialized properly. In the case, the required memory can be unreasonably large.

HenrikLovold commented 7 years ago

@ikekonglp That might be it! I am running with a context of 10 words, but I never changed anything in the code before compiling. Where do I set these parameters? Do I have to rebuild the entire parser?

miguelballesteros commented 7 years ago

@henrik2706 you should first try to increase the memory with cnn-mem. Sometimes, 2000 is not enough. See this (entire) thread in cnn https://github.com/clab/cnn/issues/52 (just in case the bug in cnn is still in the version that you are using.

If you change the code, you will have to make again, yes.

HenrikLovold commented 7 years ago

Hello again, I have now upgraded to cnn v2, and now I'm running into compiler errors when compiling lstm-parse.cc. The error I am getting is error: ‘LookupParameters’ does not name a type error: ‘Parameters’ does not name a type

Is lstm-parser compatible with cnn v2 yet?

miguelballesteros commented 7 years ago

hi @henrik2706 it is not, but you can see how they changed the code to allow more memory in the link that I sent you.

HenrikLovold commented 7 years ago

Okay, final update. I managed to fix this problem by editing the devices.cc and devices.h in cnn (master, not ver2). I changed the parameter type of mb in Device_CPU::Device_CPU from int to _sizet. It appears I solved the integer overflow problem, and I am now able to allocate as much memory as I desire by using the parameter --cnn-mem.

Closing this now. Thanks a lot to @miguelballesteros and @ikekonglp for your help.

miguelballesteros commented 7 years ago

Great!

HenrikLovold commented 7 years ago

Just a follow-up question: I've rewritten the part of the devices class that caused the integer overflow which caused the memory problem for lstm-parser. I have also added an early stopping feature to the lstm-parser, allowing the user to specify a number of iterations to be run at the command line. This is really handy if you have to leave the computer while running the parser, as you no longer have to manually give a SIGINT in order to save and exit.

Would you like me to make a commit to your repository so you can reveiw these changes and possibly merge them with the master?

clab / lstm-parser

cnn out of memory #2 #11