NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.84k stars 897 forks source link

run MatchZoo/examples/wikiqa$ bash run_dssm.sh failed #21

Closed SeekPoint closed 6 years ago

SeekPoint commented 6 years ago

mldl@mldlUB1604:~/ub16_prj/MatchZoo/examples/wikiqa$ bash run_dssm.sh Using TensorFlow backend. 2017-12-14 03:34:23.080444: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-12-14 03:34:23.080467: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-12-14 03:34:23.080490: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-12-14 03:34:23.080496: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-12-14 03:34:23.080514: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2017-12-14 03:34:23.169856: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-12-14 03:34:23.170205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate (GHz) 1.124 pciBusID 0000:01:00.0 Total memory: 3.95GiB Free memory: 3.65GiB 2017-12-14 03:34:23.170236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 2017-12-14 03:34:23.170242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y 2017-12-14 03:34:23.170271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 950M, pci bus id: 0000:01:00.0) { "inputs": { "test": { "phase": "EVAL", "input_type": "Triletter_ListGenerator", "batch_list": 10, "relation_file": "./data/WikiQA/relation_test.txt", "dtype": "dssm" }, "predict": { "phase": "PREDICT", "input_type": "Triletter_ListGenerator", "batch_list": 10, "relation_file": "./data/WikiQA/relation_test.txt", "dtype": "dssm" }, "train": { "relation_file": "./data/WikiQA/relation_train.txt", "input_type": "Triletter_PairGenerator", "batch_size": 100, "batch_per_iter": 5, "dtype": "dssm", "phase": "TRAIN", "query_per_iter": 50, "use_iter": false }, "share": { "vocab_size": 3314, "embed_size": 1, "target_mode": "ranking", "text1_corpus": "./data/WikiQA/corpus_preprocessed.txt", "text2_corpus": "./data/WikiQA/corpus_preprocessed.txt", "word_triletter_map_file": "./data/WikiQA/word_triletter_map.txt" }, "valid": { "phase": "EVAL", "input_type": "Triletter_ListGenerator", "batch_list": 10, "relation_file": "./data/WikiQA/relation_valid.txt", "dtype": "dssm" } }, "global": { "optimizer": "adam", "num_iters": 400, "save_weights_iters": 10, "learning_rate": 0.0001, "test_weights_iters": 400, "weights_file": "examples/wikiqa/weights/dssm.wikiqa.weights", "model_type": "PY", "display_interval": 10 }, "outputs": { "predict": { "save_format": "TREC", "save_path": "predict.test.wikiqa.txt" } }, "losses": [ { "object_name": "rank_hinge_loss", "object_params": { "margin": 1.0 } } ], "metrics": [ "ndcg@3", "ndcg@5", "map" ], "net_name": "DSSM", "model": { "model_py": "dssm.DSSM", "setting": { "dropout_rate": 0.9, "hidden_sizes": [ 300 ] }, "model_path": "./matchzoo/models/" } } [Embedding] Embedding Load Done. [Input] Process Input Tags. [u'train'] in TRAIN, [u'test', u'valid'] in EVAL. [./data/WikiQA/corpus_preprocessed.txt] Data size: 24106 [Dataset] 1 Dataset Load Done. {u'relation_file': u'./data/WikiQA/relation_train.txt', u'vocab_size': 3314, u'embed_size': 1, u'target_mode': u'ranking', u'input_type': u'Triletter_PairGenerator', u'text1_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'batch_size': 100, u'batch_per_iter': 5, u'text2_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/WikiQA/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'TRAIN', 'embed': array([[-0.18291523], [-0.00574826], [-0.13887608], ..., [-0.17844775], [-0.1465386 ], [-0.13503003]], dtype=float32), u'query_per_iter': 50, u'use_iter': False} [./data/WikiQA/relation_train.txt] Instance size: 20360 Pair Instance Count: 8995 [Triletter_PairGenerator] init done {u'relation_file': u'./data/WikiQA/relation_test.txt', u'vocab_size': 3314, u'embed_size': 1, u'target_mode': u'ranking', u'input_type': u'Triletter_ListGenerator', u'batch_list': 10, u'text1_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'text2_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/WikiQA/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'EVAL', 'embed': array([[-0.18291523], [-0.00574826], [-0.13887608], ..., [-0.17844775], [-0.1465386 ], [-0.13503003]], dtype=float32)} [./data/WikiQA/relation_test.txt] Instance size: 2341 List Instance Count: 237 [Triletter_ListGenerator] init done {u'relation_file': u'./data/WikiQA/relation_valid.txt', u'vocab_size': 3314, u'embed_size': 1, u'target_mode': u'ranking', u'input_type': u'Triletter_ListGenerator', u'batch_list': 10, u'text1_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'text2_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/WikiQA/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'EVAL', 'embed': array([[-0.18291523], [-0.00574826], [-0.13887608], ..., [-0.17844775], [-0.1465386 ], [-0.13503003]], dtype=float32)} [./data/WikiQA/relation_valid.txt] Instance size: 1126 List Instance Count: 122 [Triletter_ListGenerator] init done [DSSM] init done [layer]: Input [shape]: [None, 3314] [Memory] Total Memory Use: 294.5273 MB Resident: 301596 Shared: 0 UnshareData: 0 UnshareStack: 0 [layer]: Input [shape]: [None, 3314] [Memory] Total Memory Use: 294.5273 MB Resident: 301596 Shared: 0 UnshareData: 0 UnshareStack: 0 [layer]: MLP [shape]: [None, 300] [Memory] Total Memory Use: 295.1914 MB Resident: 302276 Shared: 0 UnshareData: 0 UnshareStack: 0 [layer]: MLP [shape]: [None, 300] [Memory] Total Memory Use: 295.1914 MB Resident: 302276 Shared: 0 UnshareData: 0 UnshareStack: 0 [layer]: Dot [shape]: [None, 1] [Memory] Total Memory Use: 295.1914 MB Resident: 302276 Shared: 0 UnshareData: 0 UnshareStack: 0 [Model] Model Compile Done. [12-14-2017 03:34:23] [Train:train] Traceback (most recent call last): File "matchzoo/main.py", line 328, in main(sys.argv) File "matchzoo/main.py", line 320, in main train(config) File "matchzoo/main.py", line 151, in train verbose = 0 File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 87, in wrapper return func(*args, **kwargs) TypeError: fit_generator() got an unexpected keyword argument 'shuffle' Using TensorFlow backend. 2017-12-14 03:34:25.341013: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-12-14 03:34:25.341035: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-12-14 03:34:25.341060: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-12-14 03:34:25.341064: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-12-14 03:34:25.341069: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2017-12-14 03:34:25.406950: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-12-14 03:34:25.407200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate (GHz) 1.124 pciBusID 0000:01:00.0 Total memory: 3.95GiB Free memory: 3.65GiB 2017-12-14 03:34:25.407216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 2017-12-14 03:34:25.407220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y 2017-12-14 03:34:25.407230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 950M, pci bus id: 0000:01:00.0) { "inputs": { "test": { "phase": "EVAL", "input_type": "Triletter_ListGenerator", "batch_list": 10, "relation_file": "./data/WikiQA/relation_test.txt", "dtype": "dssm" }, "predict": { "phase": "PREDICT", "input_type": "Triletter_ListGenerator", "batch_list": 10, "relation_file": "./data/WikiQA/relation_test.txt", "dtype": "dssm" }, "train": { "relation_file": "./data/WikiQA/relation_train.txt", "input_type": "Triletter_PairGenerator", "batch_size": 100, "batch_per_iter": 5, "dtype": "dssm", "phase": "TRAIN", "query_per_iter": 50, "use_iter": false }, "share": { "vocab_size": 3314, "embed_size": 1, "target_mode": "ranking", "text1_corpus": "./data/WikiQA/corpus_preprocessed.txt", "text2_corpus": "./data/WikiQA/corpus_preprocessed.txt", "word_triletter_map_file": "./data/WikiQA/word_triletter_map.txt" }, "valid": { "phase": "EVAL", "input_type": "Triletter_ListGenerator", "batch_list": 10, "relation_file": "./data/WikiQA/relation_valid.txt", "dtype": "dssm" } }, "global": { "optimizer": "adam", "num_iters": 400, "save_weights_iters": 10, "learning_rate": 0.0001, "test_weights_iters": 400, "weights_file": "examples/wikiqa/weights/dssm.wikiqa.weights", "model_type": "PY", "display_interval": 10 }, "outputs": { "predict": { "save_format": "TREC", "save_path": "predict.test.wikiqa.txt" } }, "losses": [ { "object_name": "rank_hinge_loss", "object_params": { "margin": 1.0 } } ], "metrics": [ "ndcg@3", "ndcg@5", "map" ], "net_name": "DSSM", "model": { "model_py": "dssm.DSSM", "setting": { "dropout_rate": 0.9, "hidden_sizes": [ 300 ] }, "model_path": "./matchzoo/models/" } } [Embedding] Embedding Load Done. [Input] Process Input Tags. [u'predict'] in PREDICT. [./data/WikiQA/corpus_preprocessed.txt] Data size: 24106 [Dataset] 1 Dataset Load Done. {u'relation_file': u'./data/WikiQA/relation_test.txt', u'vocab_size': 3314, u'embed_size': 1, u'target_mode': u'ranking', u'input_type': u'Triletter_ListGenerator', u'batch_list': 10, u'text1_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'text2_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/WikiQA/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'PREDICT', 'embed': array([[-0.18291523], [-0.00574826], [-0.13887608], ..., [-0.17844775], [-0.1465386 ], [-0.13503003]], dtype=float32)} [./data/WikiQA/relation_test.txt] Instance size: 2341 List Instance Count: 237 [Triletter_ListGenerator] init done [DSSM] init done [layer]: Input [shape]: [None, 3314] [Memory] Total Memory Use: 289.7930 MB Resident: 296748 Shared: 0 UnshareData: 0 UnshareStack: 0 [layer]: Input [shape]: [None, 3314] [Memory] Total Memory Use: 289.7930 MB Resident: 296748 Shared: 0 UnshareData: 0 UnshareStack: 0 [layer]: MLP [shape]: [None, 300] [Memory] Total Memory Use: 290.1719 MB Resident: 297136 Shared: 0 UnshareData: 0 UnshareStack: 0 [layer]: MLP [shape]: [None, 300] [Memory] Total Memory Use: 290.1719 MB Resident: 297136 Shared: 0 UnshareData: 0 UnshareStack: 0 [layer]: Dot [shape]: [None, 1] [Memory] Total Memory Use: 290.4727 MB Resident: 297444 Shared: 0 UnshareData: 0 UnshareStack: 0 Traceback (most recent call last): File "matchzoo/main.py", line 328, in main(sys.argv) File "matchzoo/main.py", line 322, in main predict(config) File "matchzoo/main.py", line 245, in predict model.load_weights(weights_file) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2566, in load_weights f = h5py.File(filepath, mode='r') File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 269, in init fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr) File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 99, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 78, in h5py.h5f.open IOError: Unable to open file (unable to open file: name = 'examples/wikiqa/weights/dssm.wikiqa.weights.400', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0) mldl@mldlUB1604:~/ub16_prj/MatchZoo/examples/wikiqa$

faneshion commented 6 years ago

This is due to the lacking of "examples/wikiqa/weights/" path~ You can run "mkdir -p examples/wikiqa/weights" to solve it.

SeekPoint commented 6 years ago

new error after create that directory:

[layer]: Input [shape]: [None, 3314] [Memory] Total Memory Use: 161.0156 MB Resident: 168837120 Shared: 0 UnshareData: 0 UnshareStack: 0 [layer]: Input [shape]: [None, 3314] [Memory] Total Memory Use: 161.0312 MB Resident: 168853504 Shared: 0 UnshareData: 0 UnshareStack: 0 [layer]: MLP [shape]: [None, 300] [Memory] Total Memory Use: 161.5273 MB Resident: 169373696 Shared: 0 UnshareData: 0 UnshareStack: 0 [layer]: MLP [shape]: [None, 300] [Memory] Total Memory Use: 161.5547 MB Resident: 169402368 Shared: 0 UnshareData: 0 UnshareStack: 0 [layer]: Dot [shape]: [None, 1] [Memory] Total Memory Use: 161.8008 MB Resident: 169660416 Shared: 0 UnshareData: 0 UnshareStack: 0 [Model] Model Compile Done. [12-14-2017 10:11:36] [Train:train] Traceback (most recent call last): File "matchzoo/main.py", line 330, in main(sys.argv) File "matchzoo/main.py", line 322, in main train(config) File "matchzoo/main.py", line 151, in train verbose = 0 File "/Users/yike.ke/yike_prj/ve_tf1.0_py2/venv/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper return func(*args, **kwargs) TypeError: fit_generator() got an unexpected keyword argument 'shuffle'