LARS-research / AutoSF

Y. Zhang, Q. Yao, J. Kwok. Bilinear Scoring Function Search for Knowledge Graph Learning. TPAMI 2022
68 stars 12 forks source link

Error for multiprocessing #4

Closed Luckick closed 3 years ago

Luckick commented 3 years ago

Hi, I am using two 1080Ti for the training and set the --parrel as 2 (update: even 1 would cause same problem) However, I get the bellow error although I notice "best_score = 0" in line 94 of train.py, any idea to fix it? Thanks!

B=4 Iter 1 sampled 5 candidate state for evaluate 12846 newID: 1 newID: 0 [2 3 0 1] 4 [3 2 0 1] 4 newID: 2 [0 1 2 3] 4 newID: 3 [1 3 2 0] 4 newID: 4 [0 2 1 3] 4


multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "~/.conda/envs/dgl-ke/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "~/AutoSF/train.py", line 66, in run_model
    best_mrr, best_str = model.train(train_data, tester_val, tester_tst)
  File "~/AutoSF/base_model.py", line 79, in train
    return best_mrr, best_str
UnboundLocalError: local variable 'best_str' referenced before assignment
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "~/AutoSF/train.py", line 155, in <module>
    scor = score.get()
  File "~/.conda/envs/dgl-ke/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
UnboundLocalError: local variable 'best_str' referenced before assignment
yzhangee commented 3 years ago

Thanks for your interest.

which dataset are you working on? It seems that the model runs into some error and results in valid_mrr=0, so the best_str is not assigned a value.

you can provide more details for us to check the problem.

Luckick commented 3 years ago

Thanks for the quick reply.

I am using 'KG_Data/FB15K237'.

parser.add_argument('--task_dir', type=str, default='KG_Data/FB15K237', help='the directory to dataset')
parser.add_argument('--optim', type=str, default='adagrad', help='optimization method')
parser.add_argument('--lamb', type=float, default=0.2, help='set weight decay value')
parser.add_argument('--decay_rate', type=float, default=1.0, help='set learning rate decay value')
parser.add_argument('--n_dim', type=int, default=32, help='set embedding dimension')
parser.add_argument('--parrel', type=int, default=1, help='set gpu #')
parser.add_argument('--lr', type=float, default=0.5, help='set learning rate')
parser.add_argument('--thres', type=float, default=0.0, help='threshold for early stopping')
parser.add_argument('--n_epoch', type=int, default=100, help='number of training epochs')
parser.add_argument('--n_batch', type=int, default=32, help='batch size')
parser.add_argument('--epoch_per_test', type=int, default=250, help='frequency of testing')
parser.add_argument('--test_batch_size', type=int, default=20, help='test batch size')
parser.add_argument('--filter', type=bool, default=True, help='whether do filter in testing')
parser.add_argument('--mode', type=str, default='search', help='which mode this code is running for')
parser.add_argument('--out_file_info', type=str, default='', help='extra string for the output file name')
yzhangee commented 3 years ago

parser.add_argument('--n_epoch', type=int, default=100, help='number of training epochs') parser.add_argument('--epoch_per_test', type=int, default=250, help='frequency of testing')

Hi, the problem is here. You just run 100 epochs but epoch_per_test is 250. Thus the model evaluation is not conducted. You can adjust epoch_per_test as 25 or increase n_epoch.

Luckick commented 3 years ago

Thanks! The problem is solved.