Open AjayChoudary opened 7 years ago
Yes the full dataset is pretty large so you need large infrastructure. But I'm going to push a flag that will let you specify a subset of the full dataset.
The program is killed by the Linux Kernel when you run out of RAM. Without GPU, you also consume more RAM due to the network.
I also had planned to add a function to only keep a fraction of the dataset but I never implemented it.
Another problem with my code is that all batches are generated at once which isn't really efficient in term of memory (the dataset is basically duplicated in memory). By using Python generator instead, it should be possible to be much more efficient.
@AjayChoudary @Conchylicultor Here's a quickfix that restricts the size of the dataset used (sorry, it's in an unrelated PR):
https://github.com/Conchylicultor/DeepQA/pull/57/commits/3ac60103dd323eac7f838dd949b7266f42cc508b
I downloaded the ubuntu corpus as mentioned Readme.md and ran below command but is getting killed while training without any error.
./main.py --corpus ubuntu --modelTag ubuntufull
Welcome to DeepQA v0.1 !
TensorFlow detected: v0.12.1 Training samples not found. Creating dataset...
Ubuntu dialogs subfolders: 100%|█████████████████████| 350/350 [1:49:50<00:00, 18.83s/it] Extract conversations: 100%|█████████████████| 1852868/1852868 [1:28:19<00:00, 349.62it/s] Saving dataset...
Loaded ubuntu: 891832 words, 6841047 QA Model creation... Initialize variables... WARNING: No previous model found, starting from clean directory: /root/ajay/DeepQA-2/save/model-ubuntufull Start training (press Ctrl+C to save and exit)...
----- Epoch 1/30 ; (lr=0.001) ----- Shuffling the dataset... Killed
How to debug this issue ? There is no error exception before it got killed! There is no issue when training on cornell/scotus ! OS: ubuntu 16.04, Hardaware: Vsphere VM with 8 Core CPU & No GPU.