Open mbollmann opened 7 years ago
"tag_index out of range: 1" indicates that there is no column 1 in an intermediate file that is being generated (see below). Are you sure that your original data is in correct format? Should be tab separation between the input x and output y. Moreover, all characters in x and all characters in y should be separated by ordinary space.
The intermediate file that is being generated for Marmot is being stored in tmp/ This file should have 3 columns: column 0 is input, column 1 is output (this appears to be missing), column 2 is the features.
Okay, apparently this was my fault: the training file was in correct format, but I tried to re-start the training process in the middle due to a crash I got (the code silently assumes the existence of a tmp/
subdirectory or it will fail -- the same applies later on to a MODELS_cl/
subdirectory for saving the model). When I started it again from the beginning, I got no such error.
Unfortunately the training fails due to memory issues now. By default, you allocate 120GB heap space to the Java process, which is a dangerous default setting IMO (it bogged down my entire system the first time I ran it). When I changed it to allocate less, Marmot just fails with "OutOfMemoryError". Not sure how to proceed apart from upgrading to huge amounts of RAM...
Good points. We should have documented the necessity of creating the respective subdirectories. I'll update the README in the upcoming days.
Concerning the memory: 120G was not a problem for our machines. Maybe you find a smaller value that actually works for your machines and problem. Alternatively you can use a smaller order and/or smaller context size. Good orders for seq2seq problems seem to be up to 7, but you might get a good system already with orders 2-3.
I'll try your suggestions, thanks! (Or maybe try letting it run overnight when I'm not actively using the machine.)
When trying to use
train_complex.sh
on some of my own data, I'm getting this exception from Marmot which I have no idea how to debug:Using
marmot-2015-10-22.jar
.This doesn't happen with the supplied Twitter sample data, FWIW.