ZhitingHu / EEEL

3 stars 3 forks source link

Unable to run the training script due to Segfault #6

Open jackbergus opened 5 years ago

jackbergus commented 5 years ago

After forking your project and linking it with the most recent version of gflags and glog, I was able to compile your code. Nevertheless, I was able to run the code, but a Segfault happens. In particular, this Segfault happens just after the log message in ee_engine.cpp:

     LOG(INFO) << "Segfault here.";

I provide you the program's trace before it stopped. Is there a way to solve this problem?

Running ee_main I0501 21:12:26.627094 6525 ee_engine.cpp:75] number of category: 257 I0501 21:12:26.629631 6525 ee_engine.cpp:81] number of entity: 8499 I0501 21:12:26.631129 6525 ee_engine.cpp:138] Reading /media/giacomo/Data/Progetti/Alignment/EEEL/data/apple/entity2category.txt

I0501 21:12:26.653790 6525 ee_engine.cpp:94] Reading hierarchy_id.txt I0501 21:12:26.654392 6525 ee_engine.cpp:112] Reading level.txt I0501 21:12:26.655297 6525 ee_engine.cpp:253] Reading /media/giacomo/Data/Progetti/Alignment/EEEL/data/apple/entity2ancestor.txt

I0501 21:12:26.655759 6525 ee_engine.cpp:183] Reading /media/giacomo/Data/Progetti/Alignment/EEEL/data/apple/pair.txt

I0501 21:12:26.663928 6525 ee_engine.cpp:211] number of training data: 12706 I0501 21:12:26.664072 6525 ee_engine.cpp:133] Data reading done. I0501 21:12:26.664080 6525 ee_main.cpp:74] here I0501 21:12:26.664672 6525 ee_engine.cpp:356] Build Noise Distribution Done. I0501 21:12:26.716038 6525 ee_engine.cpp:491] Segfault here. ./train.sh: line 73: 6525 Segmentation Fault (core dump created) GLOG_logtostderr=0 GLOG_stderrthreshold=0 GLOG_log_dir=$log_dir GLOG_v=-1 GLOG_minloglevel=0 GLOG_vmodule="" $prog_path --dim_embedding $dim_embedding --distance_metric_mode $distance_metric_mode --num_iter $num_iter --eval_interval $eval_interval --num_iter_per_eval $num_iter_per_eval --batch_size $batch_size --solver_type $solver_type --learning_rate $learning_rate --num_neg_sample $num_neg_sample --num_epoch_on_batch $num_epoch_on_batch --num_iter_on_entity $num_iter_on_entity --num_iter_on_category $num_iter_on_category --dataset_path $dataset_path --output_file_prefix $output_dir --snapshot $snapshot

jackbergus commented 5 years ago

The problem is within the FindCommonAncestors method, because it seems that the map entity_ancestor_weights_ is either not initialized, or some of the entities (still) have null values inside the map. A possible motivation is that AddAncestorWeights is called only once. Is there a working version that you used for your paper?