RichardHGL / WSDM2021_NSM

Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals. WSDM 2021.
132 stars 22 forks source link

FileNotFoundError: [Errno 2] No such file or directory: 'checkpoint/CWQ_student/../CWQ_teacher/CWQ_hybrid_teacher-final.ckpt' #18

Closed cdhx closed 2 years ago

cdhx commented 2 years ago

I run run_CWQ.sh. Here is all log, i have download all data file.

Why is can still run after get error.

How many memory does it need, my GPU is 24GB but out of memory.

Thx

(grailqa) xh@4210GPU:~/PycharmProject/NSM$ bash run_CWQ.sh
run_CWQ.sh: line 3: $'\r': command not found
2022-03-03 14:47:55,155 - root - INFO - PARAMETER----------
2022-03-03 14:47:55,155 - root - INFO - BATCH_SIZE=20
2022-03-03 14:47:55,156 - root - INFO - CHAR2ID=chars.txt
2022-03-03 14:47:55,156 - root - INFO - CHECKPOINT_DIR=checkpoint/CWQ_teacher/
2022-03-03 14:47:55,156 - root - INFO - CONSTRAIN_TYPE=js
2022-03-03 14:47:55,156 - root - INFO - DATA_FOLDER=dataset/CWQ/
2022-03-03 14:47:55,156 - root - INFO - DECAY_RATE=0.0
2022-03-03 14:47:55,156 - root - INFO - ENCODE_TYPE=True
2022-03-03 14:47:55,156 - root - INFO - ENCODER_TYPE=lstm
2022-03-03 14:47:55,156 - root - INFO - ENTITY2ID=entities.txt
2022-03-03 14:47:55,156 - root - INFO - ENTITY_DIM=50
2022-03-03 14:47:55,156 - root - INFO - ENTITY_EMB_FILE=None
2022-03-03 14:47:55,156 - root - INFO - ENTITY_KGE_FILE=None
2022-03-03 14:47:55,156 - root - INFO - ENTROPY_WEIGHT=0.0
2022-03-03 14:47:55,156 - root - INFO - EPS=0.95
2022-03-03 14:47:55,156 - root - INFO - EVAL_EVERY=2
2022-03-03 14:47:55,156 - root - INFO - EXPERIMENT_NAME=CWQ_hybrid_teacher
2022-03-03 14:47:55,156 - root - INFO - FACT_DROP=0
2022-03-03 14:47:55,156 - root - INFO - FACT_SCALE=3
2022-03-03 14:47:55,156 - root - INFO - FILTER_LABEL=False
2022-03-03 14:47:55,157 - root - INFO - FILTER_SUB=False
2022-03-03 14:47:55,157 - root - INFO - GRADIENT_CLIP=1.0
2022-03-03 14:47:55,157 - root - INFO - IS_EVAL=False
2022-03-03 14:47:55,157 - root - INFO - KG_DIM=100
2022-03-03 14:47:55,157 - root - INFO - KGE_DIM=100
2022-03-03 14:47:55,157 - root - INFO - LABEL_F1=0.5
2022-03-03 14:47:55,157 - root - INFO - LABEL_FILE=None
2022-03-03 14:47:55,157 - root - INFO - LABEL_SMOOTH=0.1
2022-03-03 14:47:55,157 - root - INFO - LAMBDA_BACK=0.1
2022-03-03 14:47:55,157 - root - INFO - LAMBDA_CONSTRAIN=0.01
2022-03-03 14:47:55,157 - root - INFO - LAMBDA_LABEL=0.01
2022-03-03 14:47:55,157 - root - INFO - LINEAR_DROPOUT=0.2
2022-03-03 14:47:55,157 - root - INFO - LOAD_EXPERIMENT=../pretrain/CWQ_nsm-final.ckpt
2022-03-03 14:47:55,157 - root - INFO - LOAD_PRETRAIN=None
2022-03-03 14:47:55,157 - root - INFO - LOG_LEVEL=info
2022-03-03 14:47:55,157 - root - INFO - LOSS_TYPE=kl
2022-03-03 14:47:55,157 - root - INFO - LR=0.0005
2022-03-03 14:47:55,157 - root - INFO - LR_SCHEDULE=False
2022-03-03 14:47:55,158 - root - INFO - LSTM_DROPOUT=0.3
2022-03-03 14:47:55,158 - root - INFO - MODE=teacher
2022-03-03 14:47:55,158 - root - INFO - MODEL_NAME=gnn
2022-03-03 14:47:55,158 - root - INFO - NAME=webqsp
2022-03-03 14:47:55,158 - root - INFO - NUM_EPOCH=70
2022-03-03 14:47:55,158 - root - INFO - NUM_LAYER=1
2022-03-03 14:47:55,158 - root - INFO - NUM_STEP=4
2022-03-03 14:47:55,158 - root - INFO - PRETRAINED_ENTITY_KGE_FILE=entity_emb_100d.npy
2022-03-03 14:47:55,158 - root - INFO - Q_TYPE=seq
2022-03-03 14:47:55,158 - root - INFO - REASON_KB=True
2022-03-03 14:47:55,158 - root - INFO - REL_WORD_IDS=rel_word_idx.npy
2022-03-03 14:47:55,158 - root - INFO - RELATION2ID=relations.txt
2022-03-03 14:47:55,158 - root - INFO - RELATION_EMB_FILE=None
2022-03-03 14:47:55,158 - root - INFO - RELATION_KGE_FILE=None
2022-03-03 14:47:55,158 - root - INFO - SEED=19960626
2022-03-03 14:47:55,158 - root - INFO - SHARE_EMBEDDING=False
2022-03-03 14:47:55,158 - root - INFO - SHARE_ENCODER=False
2022-03-03 14:47:55,158 - root - INFO - SHARE_INSTRUCTION=False
2022-03-03 14:47:55,158 - root - INFO - TEACHER_TYPE=hybrid
2022-03-03 14:47:55,159 - root - INFO - TEST_BATCH_SIZE=40
2022-03-03 14:47:55,159 - root - INFO - TRAIN_KL=False
2022-03-03 14:47:55,159 - root - INFO - TREE_SOFT=False
2022-03-03 14:47:55,159 - root - INFO - USE_CUDA=True
2022-03-03 14:47:55,159 - root - INFO - USE_INVERSE_RELATION=False
2022-03-03 14:47:55,159 - root - INFO - USE_LABEL=False
2022-03-03 14:47:55,159 - root - INFO - USE_SELF_LOOP=True
2022-03-03 14:47:55,159 - root - INFO - WORD2ID=vocab_new.txt
2022-03-03 14:47:55,159 - root - INFO - WORD_DIM=300
2022-03-03 14:47:55,159 - root - INFO - WORD_EMB_FILE=word_emb_300d.npy
2022-03-03 14:47:55,159 - root - INFO - -------------------
building word index ...
Entity: 2429346, Relation in KB: 6649, Relation in use: 6650
loading data from dataset/CWQ/train_simple.json
27639it [02:00, 229.81it/s]
skip {13194, 9931, 17485, 10670, 17373, 1113, 21468, 509}
max_facts:  34098
converting global to local entity index ...
100%|██████████████████████████████████████████████████████████████████████| 27631/27631 [00:06<00:00, 4224.13it/
avg local entity:  1297.9829539285586
max local entity:  2001
preparing dep ...
100%|█████████████████████████████████████████████████████████████████████| 27631/27631 [00:02<00:00, 12304.01it/
preparing data ...
100%|███████████████████████████████████████████████████████████████████████| 27631/27631 [01:29<00:00, 307.72it/
27631 cases in total, 0 cases without query entity, 14953 cases with single query entity, 12678 cases with multip query entities
building word index ...
Entity: 2429346, Relation in KB: 6649, Relation in use: 6650
loading data from dataset/CWQ/dev_simple.json
3519it [00:06, 506.21it/s]
skip set()
max_facts:  32496
converting global to local entity index ...
100%|████████████████████████████████████████████████████████████████████████| 3519/3519 [00:00<00:00, 4223.77it/
avg local entity:  1338.1057118499573
max local entity:  2001
preparing dep ...
100%|███████████████████████████████████████████████████████████████████████| 3519/3519 [00:00<00:00, 12308.09it/
preparing data ...
100%|█████████████████████████████████████████████████████████████████████████| 3519/3519 [00:11<00:00, 298.42it/
3519 cases in total, 0 cases without query entity, 1794 cases with single query entity, 1725 cases with multiple ery entities
building word index ...
Entity: 2429346, Relation in KB: 6649, Relation in use: 6650
loading data from dataset/CWQ/test_simple.json
3531it [00:23, 148.57it/s]
skip set()
max_facts:  34098
converting global to local entity index ...
100%|████████████████████████████████████████████████████████████████████████| 3531/3531 [00:00<00:00, 4302.08it/
avg local entity:  1337.5734919286322
max local entity:  2001
preparing dep ...
100%|███████████████████████████████████████████████████████████████████████| 3531/3531 [00:00<00:00, 12662.46it/
preparing data ...
100%|█████████████████████████████████████████████████████████████████████████| 3531/3531 [00:11<00:00, 298.49it/
3531 cases in total, 0 cases without query entity, 1829 cases with single query entity, 1702 cases with multiple ery entities
2022-03-03 14:52:33,432 - root - INFO - Building Agent.
Entity: 2429346, Relation: 6650, Word: 20049
Entity: 2429346, Relation: 6650, Word: 20049
2022-03-03 15:03:42,419 - root - INFO - Architecture: TeacherAgent_hybrid(
  (model): HybridModel(
    (relation_embedding): Embedding(6650, 200)
    (word_embedding): Embedding(20050, 300, padding_idx=20049)
    (entity_linear): Linear(in_features=100, out_features=50, bias=True)
    (relation_linear): Linear(in_features=200, out_features=50, bias=True)
    (lstm_drop): Dropout(p=0.3, inplace=False)
    (linear_drop): Dropout(p=0.2, inplace=False)
    (type_layer): TypeLayer(
      (linear_drop): Dropout(p=0.2, inplace=False)
      (kb_self_linear): Linear(in_features=50, out_features=50, bias=True)
    )
    (kld_loss): KLDivLoss()
    (bce_loss_logits): BCEWithLogitsLoss()
    (mse_loss): MSELoss()
    (instruction): LSTMInstruction(
      (lstm_drop): Dropout(p=0.3, inplace=False)
      (linear_drop): Dropout(p=0.2, inplace=False)
      (word_embedding): Embedding(20050, 300, padding_idx=20049)
      (node_encoder): LSTM(300, 50, batch_first=True)
      (cq_linear): Linear(in_features=100, out_features=50, bias=True)
      (ca_linear): Linear(in_features=50, out_features=1, bias=True)
      (question_linear0): Linear(in_features=50, out_features=50, bias=True)
      (question_linear1): Linear(in_features=50, out_features=50, bias=True)
      (question_linear2): Linear(in_features=50, out_features=50, bias=True)
      (question_linear3): Linear(in_features=50, out_features=50, bias=True)
    )
    (reasoning): GNNReasoning(
      (lstm_drop): Dropout(p=0.3, inplace=False)
      (linear_drop): Dropout(p=0.2, inplace=False)
      (softmax_d1): Softmax(dim=1)
      (score_func): Linear(in_features=50, out_features=1, bias=True)
      (rel_linear0): Linear(in_features=50, out_features=50, bias=True)
      (e2e_linear0): Linear(in_features=100, out_features=50, bias=True)
      (rel_linear1): Linear(in_features=50, out_features=50, bias=True)
      (e2e_linear1): Linear(in_features=100, out_features=50, bias=True)
      (rel_linear2): Linear(in_features=50, out_features=50, bias=True)
      (e2e_linear2): Linear(in_features=100, out_features=50, bias=True)
      (rel_linear3): Linear(in_features=50, out_features=50, bias=True)
      (e2e_linear3): Linear(in_features=100, out_features=50, bias=True)
    )
    (back_reasoning): GNNBackwardReasoning(
      (lstm_drop): Dropout(p=0.3, inplace=False)
      (linear_drop): Dropout(p=0.2, inplace=False)
      (softmax_d1): Softmax(dim=1)
      (score_func): Linear(in_features=50, out_features=1, bias=True)
      (rel_linear0): Linear(in_features=50, out_features=50, bias=True)
      (e2e_linear0): Linear(in_features=100, out_features=50, bias=True)
      (rel_linear1): Linear(in_features=50, out_features=50, bias=True)
      (e2e_linear1): Linear(in_features=100, out_features=50, bias=True)
      (rel_linear2): Linear(in_features=50, out_features=50, bias=True)
      (e2e_linear2): Linear(in_features=100, out_features=50, bias=True)
      (rel_linear3): Linear(in_features=50, out_features=50, bias=True)
      (e2e_linear3): Linear(in_features=100, out_features=50, bias=True)
    )
    (constraint_loss): MSELoss()
    (kld_loss_1): KLDivLoss()
  )
)
2022-03-03 15:03:42,420 - root - INFO - Agent params: 7509253.0
Load ckpt from checkpoint/CWQ_teacher/../pretrain/CWQ_nsm-final.ckpt
Traceback (most recent call last):
  File "main_teacher.py", line 132, in <module>
    main()
  File "main_teacher.py", line 116, in main
    trainer = Trainer_hybrid(args=vars(args), logger=logger)
  File "/home2/xh/PycharmProject/NSM/NSM/train/trainer_hybrid.py", line 45, in __init__
    self.load_pretrain()
  File "/home2/xh/PycharmProject/NSM/NSM/train/trainer_hybrid.py", line 70, in load_pretrain
    self.load_ckpt(ckpt_path)
  File "/home2/xh/PycharmProject/NSM/NSM/train/trainer_hybrid.py", line 185, in load_ckpt
    checkpoint = torch.load(filename)
  File "/home2/xh/.conda/envs/grailqa/lib/python3.6/site-packages/torch/serialization.py", line 419, in load
    f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'checkpoint/CWQ_teacher/../pretrain/CWQ_nsm-final.ckpt\r'
2022-03-03 15:04:43,154 - root - INFO - PARAMETER----------
2022-03-03 15:04:43,154 - root - INFO - BATCH_SIZE=20
2022-03-03 15:04:43,154 - root - INFO - CHAR2ID=chars.txt
2022-03-03 15:04:43,154 - root - INFO - CHECKPOINT_DIR=checkpoint/CWQ_student/
2022-03-03 15:04:43,154 - root - INFO - CONSTRAIN_TYPE=js
2022-03-03 15:04:43,154 - root - INFO - DATA_FOLDER=dataset/CWQ/
2022-03-03 15:04:43,154 - root - INFO - DECAY_RATE=0.0
2022-03-03 15:04:43,154 - root - INFO - ENCODE_TYPE=True
2022-03-03 15:04:43,155 - root - INFO - ENCODER_TYPE=lstm
2022-03-03 15:04:43,155 - root - INFO - ENTITY2ID=entities.txt
2022-03-03 15:04:43,155 - root - INFO - ENTITY_DIM=50
2022-03-03 15:04:43,155 - root - INFO - ENTITY_EMB_FILE=None
2022-03-03 15:04:43,155 - root - INFO - ENTITY_KGE_FILE=None
2022-03-03 15:04:43,155 - root - INFO - ENTROPY_WEIGHT=0.0
2022-03-03 15:04:43,155 - root - INFO - EPS=0.95
2022-03-03 15:04:43,155 - root - INFO - EVAL_EVERY=2
2022-03-03 15:04:43,155 - root - INFO - EXPERIMENT_NAME=CWQ_hybrid_student
2022-03-03 15:04:43,155 - root - INFO - FACT_DROP=0
2022-03-03 15:04:43,155 - root - INFO - FACT_SCALE=3
2022-03-03 15:04:43,155 - root - INFO - FILTER_LABEL=False
2022-03-03 15:04:43,155 - root - INFO - FILTER_SUB=False
2022-03-03 15:04:43,155 - root - INFO - GRADIENT_CLIP=1.0
2022-03-03 15:04:43,155 - root - INFO - IS_EVAL=False
2022-03-03 15:04:43,155 - root - INFO - KG_DIM=100
2022-03-03 15:04:43,155 - root - INFO - KGE_DIM=100
2022-03-03 15:04:43,155 - root - INFO - LABEL_F1=0.5
2022-03-03 15:04:43,155 - root - INFO - LABEL_FILE=None
2022-03-03 15:04:43,156 - root - INFO - LABEL_SMOOTH=0.1
2022-03-03 15:04:43,156 - root - INFO - LAMBDA_BACK=0.01
2022-03-03 15:04:43,156 - root - INFO - LAMBDA_CONSTRAIN=0.1
2022-03-03 15:04:43,156 - root - INFO - LAMBDA_LABEL=0.05
2022-03-03 15:04:43,156 - root - INFO - LINEAR_DROPOUT=0.2
2022-03-03 15:04:43,156 - root - INFO - LOAD_CKPT_FILE=None
2022-03-03 15:04:43,156 - root - INFO - LOAD_EXPERIMENT=None
2022-03-03 15:04:43,156 - root - INFO - LOAD_TEACHER=../CWQ_teacher/CWQ_hybrid_teacher-final.ckpt
2022-03-03 15:04:43,156 - root - INFO - LOG_LEVEL=info
2022-03-03 15:04:43,156 - root - INFO - LOSS_TYPE=kl
2022-03-03 15:04:43,156 - root - INFO - LR=0.0005
2022-03-03 15:04:43,156 - root - INFO - LR_SCHEDULE=False
2022-03-03 15:04:43,156 - root - INFO - LSTM_DROPOUT=0.3
2022-03-03 15:04:43,156 - root - INFO - MODE=teacher
2022-03-03 15:04:43,156 - root - INFO - MODEL_NAME=gnn
2022-03-03 15:04:43,156 - root - INFO - NAME=webqsp
2022-03-03 15:04:43,156 - root - INFO - NUM_EPOCH=100
2022-03-03 15:04:43,156 - root - INFO - NUM_LAYER=1
2022-03-03 15:04:43,157 - root - INFO - NUM_STEP=4
2022-03-03 15:04:43,157 - root - INFO - PRETRAINED_ENTITY_KGE_FILE=entity_emb_100d.npy
2022-03-03 15:04:43,157 - root - INFO - Q_TYPE=seq
2022-03-03 15:04:43,157 - root - INFO - REASON_KB=True
2022-03-03 15:04:43,157 - root - INFO - REL_WORD_IDS=rel_word_idx.npy
2022-03-03 15:04:43,157 - root - INFO - RELATION2ID=relations.txt
2022-03-03 15:04:43,157 - root - INFO - RELATION_EMB_FILE=None
2022-03-03 15:04:43,157 - root - INFO - RELATION_KGE_FILE=None
2022-03-03 15:04:43,157 - root - INFO - SEED=19960626
2022-03-03 15:04:43,157 - root - INFO - SHARE_EMBEDDING=False
2022-03-03 15:04:43,157 - root - INFO - SHARE_ENCODER=False
2022-03-03 15:04:43,157 - root - INFO - SHARE_INSTRUCTION=False
2022-03-03 15:04:43,157 - root - INFO - TEACHER_MODEL=gnn
2022-03-03 15:04:43,157 - root - INFO - TEACHER_TYPE=hybrid
2022-03-03 15:04:43,157 - root - INFO - TEST_BATCH_SIZE=40
2022-03-03 15:04:43,157 - root - INFO - TRAIN_KL=False
2022-03-03 15:04:43,157 - root - INFO - TREE_SOFT=False
2022-03-03 15:04:43,157 - root - INFO - USE_CUDA=True
2022-03-03 15:04:43,157 - root - INFO - USE_INVERSE_RELATION=False
2022-03-03 15:04:43,158 - root - INFO - USE_LABEL=False
2022-03-03 15:04:43,158 - root - INFO - USE_SELF_LOOP=True
2022-03-03 15:04:43,158 - root - INFO - WORD2ID=vocab_new.txt
2022-03-03 15:04:43,158 - root - INFO - WORD_DIM=300
2022-03-03 15:04:43,158 - root - INFO - WORD_EMB_FILE=word_emb_300d.npy
2022-03-03 15:04:43,158 - root - INFO - -------------------
building word index ...
Entity: 2429346, Relation in KB: 6649, Relation in use: 6650
loading data from dataset/CWQ/train_simple.json
27639it [01:52, 245.21it/s]
skip {13194, 9931, 17485, 10670, 17373, 1113, 21468, 509}
max_facts:  34098
converting global to local entity index ...
100%|█████████████████████████████████████████████████████████████████████████████████████| 27631/27631 [00:06<00
avg local entity:  1297.9829539285586
max local entity:  2001
preparing dep ...
100%|████████████████████████████████████████████████████████████████████████████████████| 27631/27631 [00:02<00:
preparing data ...
100%|██████████████████████████████████████████████████████████████████████████████████████| 27631/27631 [01:30<0
27631 cases in total, 0 cases without query entity, 14953 cases with single query entity, 12678 cases with multip
building word index ...
Entity: 2429346, Relation in KB: 6649, Relation in use: 6650
loading data from dataset/CWQ/dev_simple.json
3519it [00:07, 479.39it/s]
skip set()
max_facts:  32496
converting global to local entity index ...
100%|███████████████████████████████████████████████████████████████████████████████████████| 3519/3519 [00:00<00
avg local entity:  1338.1057118499573
max local entity:  2001
preparing dep ...
100%|██████████████████████████████████████████████████████████████████████████████████████| 3519/3519 [00:00<00:
preparing data ...
100%|████████████████████████████████████████████████████████████████████████████████████████| 3519/3519 [00:11<0
3519 cases in total, 0 cases without query entity, 1794 cases with single query entity, 1725 cases with multiple
building word index ...
Entity: 2429346, Relation in KB: 6649, Relation in use: 6650
loading data from dataset/CWQ/test_simple.json
3531it [00:25, 136.89it/s]
skip set()
max_facts:  34098
converting global to local entity index ...
100%|███████████████████████████████████████████████████████████████████████████████████████| 3531/3531 [00:00<00
avg local entity:  1337.5734919286322
max local entity:  2001
preparing dep ...
100%|██████████████████████████████████████████████████████████████████████████████████████| 3531/3531 [00:00<00:
preparing data ...
100%|████████████████████████████████████████████████████████████████████████████████████████| 3531/3531 [00:11<0
3531 cases in total, 0 cases without query entity, 1829 cases with single query entity, 1702 cases with multiple
2022-03-03 15:09:16,247 - root - INFO - Building Agent.
Entity: 2429346, Relation: 6650, Word: 20049
Entity: 2429346, Relation: 6650, Word: 20049
Traceback (most recent call last):
  File "main_student.py", line 128, in <module>
    main()
  File "main_student.py", line 114, in main
    trainer = Trainer_KBQA(args=vars(args), logger=logger)
  File "/home2/xh/PycharmProject/NSM/NSM/train/trainer_student.py", line 49, in __init__
    len(self.word2id))
  File "/home2/xh/PycharmProject/NSM/NSM/train/init.py", line 34, in init_hybrid
    agent = TeacherAgent_hybrid(args, logger, num_entity, num_relation, num_word)
  File "/home2/xh/PycharmProject/NSM/NSM/Agent/TeacherAgent.py", line 20, in __init__
    self.model = HybridModel(args, num_entity, num_relation, num_word)
  File "/home2/xh/PycharmProject/NSM/NSM/Model/hybrid_model.py", line 32, in __init__
    self.to(self.device)
  File "/home2/xh/.conda/envs/grailqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 426, in t
    return self._apply(convert)
  File "/home2/xh/.conda/envs/grailqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 202, in _
    module._apply(fn)
  File "/home2/xh/.conda/envs/grailqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in _
    param_applied = fn(param)
  File "/home2/xh/.conda/envs/grailqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 424, in c
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: out of memory
run_CWQ.sh: line 7: $'\r': command not found
2022-03-03 15:23:33,666 - root - INFO - PARAMETER----------
2022-03-03 15:23:33,666 - root - INFO - BATCH_SIZE=20
2022-03-03 15:23:33,666 - root - INFO - CHAR2ID=chars.txt
2022-03-03 15:23:33,666 - root - INFO - CHECKPOINT_DIR=checkpoint/CWQ_teacher/
2022-03-03 15:23:33,666 - root - INFO - CONSTRAIN_TYPE=js
2022-03-03 15:23:33,666 - root - INFO - DATA_FOLDER=dataset/CWQ/
2022-03-03 15:23:33,666 - root - INFO - DECAY_RATE=0.0
2022-03-03 15:23:33,666 - root - INFO - ENCODE_TYPE=True
2022-03-03 15:23:33,666 - root - INFO - ENCODER_TYPE=lstm
2022-03-03 15:23:33,666 - root - INFO - ENTITY2ID=entities.txt
2022-03-03 15:23:33,666 - root - INFO - ENTITY_DIM=50
2022-03-03 15:23:33,667 - root - INFO - ENTITY_EMB_FILE=None
2022-03-03 15:23:33,667 - root - INFO - ENTITY_KGE_FILE=None
2022-03-03 15:23:33,667 - root - INFO - ENTROPY_WEIGHT=0.0
2022-03-03 15:23:33,667 - root - INFO - EPS=0.95
2022-03-03 15:23:33,667 - root - INFO - EVAL_EVERY=2
2022-03-03 15:23:33,667 - root - INFO - EXPERIMENT_NAME=CWQ_parallel_teacher
2022-03-03 15:23:33,667 - root - INFO - FACT_DROP=0
2022-03-03 15:23:33,667 - root - INFO - FACT_SCALE=3
2022-03-03 15:23:33,667 - root - INFO - FILTER_LABEL=False
2022-03-03 15:23:33,667 - root - INFO - FILTER_SUB=False
2022-03-03 15:23:33,667 - root - INFO - GRADIENT_CLIP=1.0
2022-03-03 15:23:33,667 - root - INFO - IS_EVAL=False
2022-03-03 15:23:33,667 - root - INFO - KG_DIM=100
2022-03-03 15:23:33,667 - root - INFO - KGE_DIM=100
2022-03-03 15:23:33,667 - root - INFO - LABEL_F1=0.5
2022-03-03 15:23:33,667 - root - INFO - LABEL_FILE=None
2022-03-03 15:23:33,668 - root - INFO - LABEL_SMOOTH=0.1
2022-03-03 15:23:33,668 - root - INFO - LAMBDA_BACK=0.1
2022-03-03 15:23:33,668 - root - INFO - LAMBDA_CONSTRAIN=0.01
2022-03-03 15:23:33,668 - root - INFO - LAMBDA_LABEL=0.01
2022-03-03 15:23:33,668 - root - INFO - LINEAR_DROPOUT=0.2
2022-03-03 15:23:33,668 - root - INFO - LOAD_EXPERIMENT=None
2022-03-03 15:23:33,668 - root - INFO - LOAD_PRETRAIN=../pretrain/CWQ_nsm-final.ckpt
2022-03-03 15:23:33,668 - root - INFO - LOG_LEVEL=info
2022-03-03 15:23:33,668 - root - INFO - LOSS_TYPE=kl
2022-03-03 15:23:33,668 - root - INFO - LR=0.0005
2022-03-03 15:23:33,668 - root - INFO - LR_SCHEDULE=False
2022-03-03 15:23:33,668 - root - INFO - LSTM_DROPOUT=0.3
2022-03-03 15:23:33,668 - root - INFO - MODE=teacher
2022-03-03 15:23:33,668 - root - INFO - MODEL_NAME=gnn
2022-03-03 15:23:33,668 - root - INFO - NAME=webqsp
2022-03-03 15:23:33,669 - root - INFO - NUM_EPOCH=30
2022-03-03 15:23:33,669 - root - INFO - NUM_LAYER=1
2022-03-03 15:23:33,669 - root - INFO - NUM_STEP=4
2022-03-03 15:23:33,669 - root - INFO - PRETRAINED_ENTITY_KGE_FILE=entity_emb_100d.npy
2022-03-03 15:23:33,669 - root - INFO - Q_TYPE=seq
2022-03-03 15:23:33,669 - root - INFO - REASON_KB=True
2022-03-03 15:23:33,669 - root - INFO - REL_WORD_IDS=rel_word_idx.npy
2022-03-03 15:23:33,669 - root - INFO - RELATION2ID=relations.txt
2022-03-03 15:23:33,669 - root - INFO - RELATION_EMB_FILE=None
2022-03-03 15:23:33,669 - root - INFO - RELATION_KGE_FILE=None
2022-03-03 15:23:33,669 - root - INFO - SEED=19960626
2022-03-03 15:23:33,669 - root - INFO - SHARE_EMBEDDING=False
2022-03-03 15:23:33,669 - root - INFO - SHARE_ENCODER=False
2022-03-03 15:23:33,669 - root - INFO - SHARE_INSTRUCTION=False
2022-03-03 15:23:33,669 - root - INFO - TEACHER_TYPE=parallel
2022-03-03 15:23:33,669 - root - INFO - TEST_BATCH_SIZE=40
2022-03-03 15:23:33,670 - root - INFO - TRAIN_KL=False
2022-03-03 15:23:33,670 - root - INFO - TREE_SOFT=False
2022-03-03 15:23:33,670 - root - INFO - USE_CUDA=True
2022-03-03 15:23:33,670 - root - INFO - USE_INVERSE_RELATION=False
2022-03-03 15:23:33,670 - root - INFO - USE_LABEL=False
2022-03-03 15:23:33,670 - root - INFO - USE_SELF_LOOP=True
2022-03-03 15:23:33,670 - root - INFO - WORD2ID=vocab_new.txt
2022-03-03 15:23:33,670 - root - INFO - WORD_DIM=300
2022-03-03 15:23:33,670 - root - INFO - WORD_EMB_FILE=word_emb_300d.npy
2022-03-03 15:23:33,670 - root - INFO - -------------------
building word index ...
Entity: 2429346, Relation in KB: 6649, Relation in use: 6650
loading data from dataset/CWQ/train_simple.json
27639it [01:46, 259.64it/s]
skip {13194, 9931, 17485, 10670, 17373, 1113, 21468, 509}
max_facts:  34098
converting global to local entity index ...
100%|█████████████████████████████████████████████████████████████████████████████████████| 27631/27631 [00:06<00
avg local entity:  1297.9829539285586
max local entity:  2001
preparing dep ...
100%|████████████████████████████████████████████████████████████████████████████████████| 27631/27631 [00:02<00:
preparing data ...
100%|██████████████████████████████████████████████████████████████████████████████████████| 27631/27631 [01:33<0
27631 cases in total, 0 cases without query entity, 14953 cases with single query entity, 12678 cases with multip
building word index ...
Entity: 2429346, Relation in KB: 6649, Relation in use: 6650
loading data from dataset/CWQ/dev_simple.json
3519it [00:07, 498.71it/s]
skip set()
max_facts:  32496
converting global to local entity index ...
100%|███████████████████████████████████████████████████████████████████████████████████████| 3519/3519 [00:00<00
avg local entity:  1338.1057118499573
max local entity:  2001
preparing dep ...
100%|██████████████████████████████████████████████████████████████████████████████████████| 3519/3519 [00:00<00:
preparing data ...
100%|████████████████████████████████████████████████████████████████████████████████████████| 3519/3519 [00:11<0
3519 cases in total, 0 cases without query entity, 1794 cases with single query entity, 1725 cases with multiple
building word index ...
Entity: 2429346, Relation in KB: 6649, Relation in use: 6650
loading data from dataset/CWQ/test_simple.json
3531it [00:24, 145.61it/s]
skip set()
max_facts:  34098
converting global to local entity index ...
100%|███████████████████████████████████████████████████████████████████████████████████████| 3531/3531 [00:00<00
avg local entity:  1337.5734919286322
max local entity:  2001
preparing dep ...
100%|██████████████████████████████████████████████████████████████████████████████████████| 3531/3531 [00:00<00:
preparing data ...
100%|████████████████████████████████████████████████████████████████████████████████████████| 3531/3531 [00:11<0
3531 cases in total, 0 cases without query entity, 1829 cases with single query entity, 1702 cases with multiple
2022-03-03 15:28:02,001 - root - INFO - Building Agent.
Entity: 2429346, Relation: 6650, Word: 20049
Entity: 2429346, Relation: 6650, Word: 20049
Entity: 2429346, Relation: 6650, Word: 20049
Traceback (most recent call last):
  File "main_teacher.py", line 132, in <module>
    main()
  File "main_teacher.py", line 114, in main
    trainer = Trainer_parallel(args=vars(args), logger=logger)
  File "/home2/xh/PycharmProject/NSM/NSM/train/trainer_parallel.py", line 41, in __init__
    len(self.word2id))
  File "/home2/xh/PycharmProject/NSM/NSM/train/init.py", line 23, in init_parallel
    agent = TeacherAgent_parallel(args, logger, num_entity, num_relation, num_word)
  File "/home2/xh/PycharmProject/NSM/NSM/Agent/TeacherAgent2.py", line 20, in __init__
    self.back_model = BackwardReasonModel(args, num_entity, num_relation, num_word, self.model)
  File "/home2/xh/PycharmProject/NSM/NSM/Model/backward_model.py", line 33, in __init__
    self.to(self.device)
  File "/home2/xh/.conda/envs/grailqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 426, in t
    return self._apply(convert)
  File "/home2/xh/.conda/envs/grailqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 202, in _
    module._apply(fn)
  File "/home2/xh/.conda/envs/grailqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in _
    param_applied = fn(param)
  File "/home2/xh/.conda/envs/grailqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 424, in c
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 23.70 GiB total capacity; 33.62 MiB alreadyMiB free; 12.38 MiB cached)
2022-03-03 15:41:59,677 - root - INFO - PARAMETER----------
2022-03-03 15:41:59,677 - root - INFO - BATCH_SIZE=20
2022-03-03 15:41:59,677 - root - INFO - CHAR2ID=chars.txt
2022-03-03 15:41:59,678 - root - INFO - CHECKPOINT_DIR=checkpoint/CWQ_student/
2022-03-03 15:41:59,678 - root - INFO - CONSTRAIN_TYPE=js
2022-03-03 15:41:59,678 - root - INFO - DATA_FOLDER=dataset/CWQ/
2022-03-03 15:41:59,678 - root - INFO - DECAY_RATE=0.0
2022-03-03 15:41:59,678 - root - INFO - ENCODE_TYPE=True
2022-03-03 15:41:59,678 - root - INFO - ENCODER_TYPE=lstm
2022-03-03 15:41:59,678 - root - INFO - ENTITY2ID=entities.txt
2022-03-03 15:41:59,678 - root - INFO - ENTITY_DIM=50
2022-03-03 15:41:59,678 - root - INFO - ENTITY_EMB_FILE=None
2022-03-03 15:41:59,678 - root - INFO - ENTITY_KGE_FILE=None
2022-03-03 15:41:59,678 - root - INFO - ENTROPY_WEIGHT=0.0
2022-03-03 15:41:59,678 - root - INFO - EPS=0.95
2022-03-03 15:41:59,679 - root - INFO - EVAL_EVERY=2
2022-03-03 15:41:59,679 - root - INFO - EXPERIMENT_NAME=CWQ_parallel_student
2022-03-03 15:41:59,679 - root - INFO - FACT_DROP=0
2022-03-03 15:41:59,679 - root - INFO - FACT_SCALE=3
2022-03-03 15:41:59,679 - root - INFO - FILTER_LABEL=False
2022-03-03 15:41:59,679 - root - INFO - FILTER_SUB=False
2022-03-03 15:41:59,679 - root - INFO - GRADIENT_CLIP=1.0
2022-03-03 15:41:59,679 - root - INFO - IS_EVAL=False
2022-03-03 15:41:59,679 - root - INFO - KG_DIM=100
2022-03-03 15:41:59,679 - root - INFO - KGE_DIM=100
2022-03-03 15:41:59,679 - root - INFO - LABEL_F1=0.5
2022-03-03 15:41:59,679 - root - INFO - LABEL_FILE=None
2022-03-03 15:41:59,679 - root - INFO - LABEL_SMOOTH=0.1
2022-03-03 15:41:59,680 - root - INFO - LAMBDA_BACK=0.01
2022-03-03 15:41:59,680 - root - INFO - LAMBDA_CONSTRAIN=0.1
2022-03-03 15:41:59,680 - root - INFO - LAMBDA_LABEL=0.05
2022-03-03 15:41:59,680 - root - INFO - LINEAR_DROPOUT=0.2
2022-03-03 15:41:59,680 - root - INFO - LOAD_CKPT_FILE=None
2022-03-03 15:41:59,680 - root - INFO - LOAD_EXPERIMENT=None
2022-03-03 15:41:59,680 - root - INFO - LOAD_TEACHER=../CWQ_teacher/CWQ_parallel_teacher-final.ckpt
2022-03-03 15:41:59,680 - root - INFO - LOG_LEVEL=info
2022-03-03 15:41:59,680 - root - INFO - LOSS_TYPE=kl
2022-03-03 15:41:59,680 - root - INFO - LR=0.0005
2022-03-03 15:41:59,680 - root - INFO - LR_SCHEDULE=False
2022-03-03 15:41:59,680 - root - INFO - LSTM_DROPOUT=0.3
2022-03-03 15:41:59,680 - root - INFO - MODE=teacher
2022-03-03 15:41:59,681 - root - INFO - MODEL_NAME=gnn
2022-03-03 15:41:59,681 - root - INFO - NAME=webqsp
2022-03-03 15:41:59,681 - root - INFO - NUM_EPOCH=100
2022-03-03 15:41:59,681 - root - INFO - NUM_LAYER=1
2022-03-03 15:41:59,681 - root - INFO - NUM_STEP=4
2022-03-03 15:41:59,681 - root - INFO - PRETRAINED_ENTITY_KGE_FILE=entity_emb_100d.npy
2022-03-03 15:41:59,681 - root - INFO - Q_TYPE=seq
2022-03-03 15:41:59,681 - root - INFO - REASON_KB=True
2022-03-03 15:41:59,681 - root - INFO - REL_WORD_IDS=rel_word_idx.npy
2022-03-03 15:41:59,681 - root - INFO - RELATION2ID=relations.txt
2022-03-03 15:41:59,681 - root - INFO - RELATION_EMB_FILE=None
2022-03-03 15:41:59,682 - root - INFO - RELATION_KGE_FILE=None
2022-03-03 15:41:59,682 - root - INFO - SEED=19960626
2022-03-03 15:41:59,682 - root - INFO - SHARE_EMBEDDING=False
2022-03-03 15:41:59,682 - root - INFO - SHARE_ENCODER=False
2022-03-03 15:41:59,682 - root - INFO - SHARE_INSTRUCTION=False
2022-03-03 15:41:59,682 - root - INFO - TEACHER_MODEL=gnn
2022-03-03 15:41:59,682 - root - INFO - TEACHER_TYPE=parallel
2022-03-03 15:41:59,682 - root - INFO - TEST_BATCH_SIZE=40
2022-03-03 15:41:59,682 - root - INFO - TRAIN_KL=False
2022-03-03 15:41:59,682 - root - INFO - TREE_SOFT=False
2022-03-03 15:41:59,682 - root - INFO - USE_CUDA=True
2022-03-03 15:41:59,682 - root - INFO - USE_INVERSE_RELATION=False
2022-03-03 15:41:59,682 - root - INFO - USE_LABEL=False
2022-03-03 15:41:59,682 - root - INFO - USE_SELF_LOOP=True
2022-03-03 15:41:59,682 - root - INFO - WORD2ID=vocab_new.txt
2022-03-03 15:41:59,682 - root - INFO - WORD_DIM=300
2022-03-03 15:41:59,682 - root - INFO - WORD_EMB_FILE=word_emb_300d.npy
2022-03-03 15:41:59,682 - root - INFO - -------------------
building word index ...
Entity: 2429346, Relation in KB: 6649, Relation in use: 6650
loading data from dataset/CWQ/train_simple.json
27639it [02:06, 218.35it/s]
skip {13194, 9931, 17485, 10670, 17373, 1113, 21468, 509}
max_facts:  34098
converting global to local entity index ...
100%|█████████████████████████████████████████████████████████████████████████████████████| 27631/27631 [00:06<00
avg local entity:  1297.9829539285586
max local entity:  2001
preparing dep ...
100%|████████████████████████████████████████████████████████████████████████████████████| 27631/27631 [00:02<00:
preparing data ...
 97%|███████████████████████████████████████████████████████████████████████████████████▊  | 26937/27631 [01:42<0 98%|███████████████████████████████████████████████████████████████████████████████████▉  | 26971/27631 [01:42<0 98%|████████████████████████████████████████████████████████████████████████████████████  | 26999/27631 [01:42<0 98%|████████████████████████████████████████████████████████████████████████████████████▏ | 27029/27631 [01:42<0 98%|████████████████████████████████████████████████████████████████████████████████████▏ | 27057/27631 [01:42<0 98%|████████████████████████████████████████████████████████████████████████████████████▎ | 27083/27631 [01:42<0 98%|████████████████████████████████████████████████████████████████████████████████████▎ | 27108/27631 [01:42<0 98%|████████████████████████████████████████████████████████████████████████████████████▍ | 27131/27631 [01:43<0 98%|████████████████████████████████████████████████████████████████████████████████████▌ | 27152/27631 [01:43<0 98%|████████████████████████████████████████████████████████████████████████████████████▌ | 27173/27631 [01:43<0 98%|████████████████████████████████████████████████████████████████████████████████████▋ | 27195/27631 [01:43<0 98%|████████████████████████████████████████████████████████████████████████████████████▋ | 27215/27631 [01:43<0 99%|████████████████████████████████████████████████████████████████████████████████████▊ | 27235/27631 [01:43<0 99%|████████████████████████████████████████████████████████████████████████████████████▊ | 27257/27631 [01:43<0 99%|████████████████████████████████████████████████████████████████████████████████████▉ | 27277/27631 [01:43<0 99%|████████████████████████████████████████████████████████████████████████████████████▉ | 27302/27631 [01:43<0 99%|█████████████████████████████████████████████████████████████████████████████████████ | 27323/27631 [01:44<0 99%|█████████████████████████████████████████████████████████████████████████████████████ | 27342/27631 [01:44<0 99%|█████████████████████████████████████████████████████████████████████████████████████▏| 27360/27631 [01:44<0 99%|█████████████████████████████████████████████████████████████████████████████████████▏| 27376/27631 [01:44<0 99%|█████████████████████████████████████████████████████████████████████████████████████▎| 27392/27631 [01:44<0 99%|█████████████████████████████████████████████████████████████████████████████████████▎| 27407/27631 [01:44<0 99%|█████████████████████████████████████████████████████████████████████████████████████▍| 27434/27631 [01:44<0 99%|█████████████████████████████████████████████████████████████████████████████████████▍| 27465/27631 [01:44<0100%|█████████████████████████████████████████████████████████████████████████████████████▌| 27499/27631 [01:44<0100%|█████████████████████████████████████████████████████████████████████████████████████▋| 27525/27631 [01:45<0100%|█████████████████████████████████████████████████████████████████████████████████████▊| 27557/27631 [01:45<0100%|█████████████████████████████████████████████████████████████████████████████████████▊| 27587/27631 [01:45<0100%|█████████████████████████████████████████████████████████████████████████████████████▉| 27616/27631 [01:45<0100%|██████████████████████████████████████████████████████████████████████████████████████| 27631/27631 [01:45<062.00it/s]
27631 cases in total, 0 cases without query entity, 14953 cases with single query entity, 12678 cases with multipy entities
building word index ...
Entity: 2429346, Relation in KB: 6649, Relation in use: 6650
loading data from dataset/CWQ/dev_simple.json
3519it [00:08, 411.36it/s]
skip set()
max_facts:  32496
converting global to local entity index ...
100%|█████████████████████████████████████████████████████████████████████████████| 3519/3519 [00:01<00:00, 2777.
avg local entity:  1338.1057118499573
max local entity:  2001
preparing dep ...
100%|█████████████████████████████████████████████████████████████████████████████| 3519/3519 [00:00<00:00, 9096.
preparing data ...
100%|██████████████████████████████████████████████████████████████████████████████| 3519/3519 [00:15<00:00, 220.
3519 cases in total, 0 cases without query entity, 1794 cases with single query entity, 1725 cases with multiple ntities
building word index ...
Entity: 2429346, Relation in KB: 6649, Relation in use: 6650
loading data from dataset/CWQ/test_simple.json
3531it [00:28, 124.11it/s]
skip set()
max_facts:  34098
converting global to local entity index ...
100%|█████████████████████████████████████████████████████████████████████████████| 3531/3531 [00:00<00:00, 4115.
avg local entity:  1337.5734919286322
max local entity:  2001
preparing dep ...
100%|████████████████████████████████████████████████████████████████████████████| 3531/3531 [00:00<00:00, 12829.
preparing data ...
100%|██████████████████████████████████████████████████████████████████████████████| 3531/3531 [00:16<00:00, 217.
3531 cases in total, 0 cases without query entity, 1829 cases with single query entity, 1702 cases with multiple ntities
2022-03-03 15:47:15,270 - root - INFO - Building Agent.
Entity: 2429346, Relation: 6650, Word: 20049
Entity: 2429346, Relation: 6650, Word: 20049
Traceback (most recent call last):
  File "main_student.py", line 128, in <module>
    main()
  File "main_student.py", line 114, in main
    trainer = Trainer_KBQA(args=vars(args), logger=logger)
  File "/home2/xh/PycharmProject/NSM/NSM/train/trainer_student.py", line 46, in __init__
    len(self.word2id))
  File "/home2/xh/PycharmProject/NSM/NSM/train/init.py", line 23, in init_parallel
    agent = TeacherAgent_parallel(args, logger, num_entity, num_relation, num_word)
  File "/home2/xh/PycharmProject/NSM/NSM/Agent/TeacherAgent2.py", line 19, in __init__
    self.model = ForwardReasonModel(args, num_entity, num_relation, num_word)
  File "/home2/xh/PycharmProject/NSM/NSM/Model/forward_model.py", line 33, in __init__
    self.to(self.device)
  File "/home2/xh/.conda/envs/grailqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 426, in t
    return self._apply(convert)
  File "/home2/xh/.conda/envs/grailqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 202, in _
    module._apply(fn)
  File "/home2/xh/.conda/envs/grailqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in _
    param_applied = fn(param)
  File "/home2/xh/.conda/envs/grailqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 424, in c
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: out of memory
RichardHGL commented 2 years ago

Did you modify any hyper-parameters or model part? According to my impression, this command can run in single 12 GB memory GPU. You can try to reduce the batch_size, and see how it works in your machine.

cdhx commented 2 years ago

Thanks for your replay! I did not change any parameter, and i will try it again. Another question is that why it can continue running after got the error in title, does it not matter?

RichardHGL commented 2 years ago

It matters, you should have checkpoint/CWQ_teacher/../pretrain/CWQ_nsm-final.ckpt (place this checkpoint in checkpoint/pretrain folder) downloaded from google drive. Then run the first command will generate a teacher ckpt.

cdhx commented 2 years ago

Here is the model in google drive (CWQ_report) ,which one should i chose?
teacher: CWQ_parallel_teacher_gnn_js-f1.ckpt CWQ_teacher_fb_gnn_js_80epoch-final.ckpt student:

CWQ_fb_student_0.01-final.ckpt  CWQ_t_gnn-parallel_s_gnn_js_100epoch-final.ckpt
CWQ_fb_student_0.01-h1.ckpt     CWQ_t_gnn-parallel_s_gnn_js_100epoch-h1.ckpt
CWQ_fb_student_0.01.log         CWQ_t_gnn-parallel_s_gnn_js_100epoch.log
RichardHGL commented 2 years ago

I checked again, you should first run the commented line: CUDA_VISIBLE_DEVICES=0 python main_nsm.py --name CWQ --model_name gnn --data_folder /home/hegaole/data/KBQA/Freebase/CWQ/ --checkpoint_dir checkpoint/pretrain/ --batch_size 20 --test_batch_size 40 --num_step 4 --entity_dim 50 --word_dim 300 --kg_dim 100 --kge_dim 100 --eval_every 2 --experiment_name CWQ_nsm --eps 0.95 --num_epoch 100 --use_self_loop --lr 5e-4 --q_type seq --word_emb_file word_emb_300d.npy --reason_kb --encode_type --loss_type kl , then the checkpoint/CWQ_teacher/../pretrain/CWQ_nsm-final.ckpt will be generated