HazyResearch / deepdive

DeepDive
deepdive.stanford.edu
1.96k stars 537 forks source link

sampler running out of memory on ec-2 244GB mem instances #256

Closed raphaelhoffmann closed 8 years ago

raphaelhoffmann commented 9 years ago

Several of the memex extractors run fine on raiders, but not on ec-2's most powerful machines. See error below. Is it possible to use memory more efficiently?

03:13:51 [sampler] INFO /home/ubuntu/stanford-memex/../deepdive/util/sampler-dw-linux gibbs 03:13:51 [sampler] INFO Executing: /home/ubuntu/stanford-memex/../deepdive/util/sampler-dw-linux gibbs -w /scratch/tmp/graph.weights -v /scratch/tmp/graph.variables -f /scratch/tmp/graph.factors -e /scratch/tmp/graph.edges -m /scratch/tmp/graph.meta -o /scratch/tmp -l 300 -s 1 -i 500 --alpha 0.1 -c 0 03:13:51 [sampler] INFO 03:13:51 [sampler] INFO #################MACHINE CONFIG################# 03:13:51 [sampler] INFO # # NUMA Node : 2 03:13:51 [sampler] INFO # # Thread/NUMA Node : 16 03:13:51 [sampler] INFO ################################################ 03:13:51 [sampler] INFO 03:13:51 [sampler] INFO #################GIBBS SAMPLING################# 03:13:51 [sampler] INFO # fg_file : /scratch/tmp/graph.meta 03:13:51 [sampler] INFO # edge_file : /scratch/tmp/graph.edges 03:13:51 [sampler] INFO # weight_file : /scratch/tmp/graph.weights 03:13:51 [sampler] INFO # variable_file : /scratch/tmp/graph.variables 03:13:51 [sampler] INFO # factor_file : /scratch/tmp/graph.factors 03:13:51 [sampler] INFO # output_folder : /scratch/tmp 03:13:51 [sampler] INFO # n_learning_epoch : 300 03:13:51 [sampler] INFO # n_samples/l. epoch : 1 03:13:51 [sampler] INFO # n_inference_epoch : 500 03:13:51 [sampler] INFO # stepsize : 0.1 03:13:51 [sampler] INFO # decay : 0.95 03:13:51 [sampler] INFO # regularization : 0.01 03:13:51 [sampler] INFO ################################################ 03:13:51 [sampler] INFO # IGNORE -s (n_samples/l. epoch). ALWAYS -s 1. # 03:13:51 [sampler] INFO # IGNORE -t (threads). ALWAYS USE ALL THREADS. # 03:13:51 [sampler] INFO ################################################ 03:13:51 [sampler] INFO # nvar : 16127117 03:13:51 [sampler] INFO # nfac : 561588541 03:13:51 [sampler] INFO # nweight : 10399160 03:13:51 [sampler] INFO # nedge : 561588541 03:13:51 [sampler] INFO ################################################ 03:13:57 [sampler] INFO LOADING VARIABLES... 03:13:58 [sampler] INFO LOADED VARIABLES: #16127117 03:13:58 [sampler] INFO N_QUERY: #15677805 03:13:58 [sampler] INFO N_EVID : #449312 03:13:58 [sampler] INFO LOADING FACTORS 03:14:20 [taskManager] INFO Memory usage: 497/1963MB (max: 27159MB) 03:15:00 [sampler] INFO LOADED FACTORS: #561588541 03:15:00 [sampler] INFO LOADING WEIGHTS... 03:15:01 [sampler] INFO LOADED WEIGHTS: #10399160 03:15:01 [sampler] INFO LOADING EDGES... 03:15:20 [taskManager] INFO Memory usage: 497/1963MB (max: 27159MB) 03:15:25 [sampler] INFO LOADED EDGES: #561588541 terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc sbt/sbt: line 1: 102808 Killed java $SBT_OPTS -jar dirname $0/sbt-launch.jar "$@"

raphaelhoffmann commented 9 years ago

I can point to a running ec-2 instance with that issue for debugging.

billgreenwald commented 8 years ago

I am running into this error too, both on the JDBC removal and normal branches. Has this been resolved/is there a workaround?

alldefector commented 8 years ago

We shall have ooc / so sampler soon