HazyResearch / deepdive

DeepDive
deepdive.stanford.edu
1.96k stars 536 forks source link

std::bad_alloc() in sampler #610

Open paidi opened 7 years ago

paidi commented 7 years ago

I am trying to run DeepDive and the following error when the sampler executes. It looks like it could be an internal error, or an out-of-memory error (my machine has 64GB of RAM). If it is the second, is there a workaround that doesn't involve increasing the amount of available memory?

Thanks!

2016-12-21 10:46:10.453492 #################MACHINE CONFIG################# 2016-12-21 10:46:10.453509 # # NUMA Node : 1 2016-12-21 10:46:10.453528 # # Thread/NUMA Node : 16 2016-12-21 10:46:10.453545 ################################################ 2016-12-21 10:46:10.453563 2016-12-21 10:46:10.453580 #################GIBBS SAMPLING################# 2016-12-21 10:46:10.453598 # fg_file : factorgraph/meta 2016-12-21 10:46:10.453616 # weight_file : /dev/fd/63 2016-12-21 10:46:10.453634 # variable_file : /dev/fd/62 2016-12-21 10:46:10.453651 # factor_file : /dev/fd/61 2016-12-21 10:46:10.453669 # output_folder : weights 2016-12-21 10:46:10.453685 # n_learning_epoch : 1000 2016-12-21 10:46:10.453703 # n_samples/l. epoch : 1 2016-12-21 10:46:10.453720 # n_inference_epoch : 1000 2016-12-21 10:46:10.453737 # stepsize : 0.01 2016-12-21 10:46:10.453754 # decay : 0.95 2016-12-21 10:46:10.453765 # regularization : 0.01 2016-12-21 10:46:10.453794 ################################################ 2016-12-21 10:46:10.453806 # IGNORE -s (n_samples/l. epoch). ALWAYS -s 1. # 2016-12-21 10:46:10.453815 # IGNORE -t (threads). ALWAYS USE ALL THREADS. # 2016-12-21 10:46:10.453825 ################################################ 2016-12-21 10:46:10.453834 # nvar : 14745098 2016-12-21 10:46:10.453844 # nfac : 889500491 2016-12-21 10:46:10.453854 # nweight : 162212221 2016-12-21 10:46:10.453863 # nedge : 889500491 2016-12-21 10:46:10.453873 ################################################ 2016-12-21 10:46:39.942907 terminate called after throwing an instance of 'std::bad_alloc' 2016-12-21 10:46:39.942999 what(): std::bad_alloc 2016-12-21 10:46:40.410893 process/model/learning/run.sh: line 22: 76438 Aborted sampler-dw gibbs -w <(flatten factorgraph/weights) -v <(flatten factorgraph/variables) -f <(flatten factorgraph/factors) -m factorgraph/meta -o weights -l 1000 -s 1 -i 1000 --alpha 0.01 --sample_evidence 2016-12-21 10:46:40.420499 find: ‘pbzip2’ terminated by signal 13

alldefector commented 7 years ago

@paidi that does seem like OOM... The log content suggests that you were probably running an old version of DeepDive (v0.8). In an upcoming release (coming soon), we'll introduce a factor graph partitioning feature, where a partitionable factor graph is processed piece by piece in memory. We haven't written up the documentation for this yet, but if you want to try it now using master (git clone and then make build; make install), the "census" example has turned on this feature: https://github.com/HazyResearch/deepdive/blob/master/examples/census/app.ddlog#L23 https://github.com/HazyResearch/deepdive/blob/master/examples/census/deepdive.conf#L3

In words, you can enable partitioning by changing two places in your DD program

  1. In app.ddlog, add @partition to the appropriate column (e.g., doc_id) of the variable relation.
  2. In deepdive.conf, add sampler.partitions: $NUM_PARTITIONS, where $NUM_PARTITIONS is the number of partitions.

Internally, DeepDive partitions the graph based on the hash value of the column marked with @partition.

paidi commented 7 years ago

Thanks for the quick response! I will try that today.