google-research / smore

Apache License 2.0
162 stars 28 forks source link

"pure virtual method called" when running vec/train_mpnet_wikikg90m.sh on GPU #8

Open zhanglizhi15 opened 2 years ago

zhanglizhi15 commented 2 years ago

The program can run normally, but it will report an error at the end, when running vec/train_mpnet_wikikg90m.sh on GPU. The specific information is as follows: 2022-07-11 08:42:01,674 INFO --------------------------------------------------------------------------------------------- 2022-07-11 08:42:01,675 INFO Model Parameter Configuration: 2022-07-11 08:42:01,676 INFO Parameter relation_embedding.embedding: torch.Size([0, 200]), require_grad = True 2022-07-11 08:42:01,676 INFO Parameter entity_embedding.embedding: torch.Size([0, 200]), require_grad = True 2022-07-11 08:42:01,677 INFO Parameter center_net.layers: torch.Size([402, 200]), require_grad = True 2022-07-11 08:42:01,677 INFO Parameter feature_mod.entity_proj.weight: torch.Size([200, 768]), require_grad = True 2022-07-11 08:42:01,678 INFO Parameter feature_mod.entity_proj.bias: torch.Size([200]), require_grad = True 2022-07-11 08:42:01,678 INFO Parameter feature_mod.relation_proj.weight: torch.Size([200, 768]), require_grad = True 2022-07-11 08:42:01,678 INFO Parameter feature_mod.relation_proj.bias: torch.Size([200]), require_grad = True 2022-07-11 08:42:01,678 INFO Parameter Number: 388000 2022-07-11 08:42:01,679 INFO --------------------------------------------------------------------------------------------- 2022-07-11 08:42:01,679 INFO Geo: VecFeatured 2022-07-11 08:42:01,679 INFO Data Path: /gf3/home/zlz/data/knowledge_graphs/wikikg90m-v2 2022-07-11 08:42:01,679 INFO #entity: 91230610 2022-07-11 08:42:01,679 INFO #relation: 1387 2022-07-11 08:42:01,679 INFO #max steps: 1001 2022-07-11 08:42:01,680 INFO Evaluate unions using: DNF 2022-07-11 08:42:11,911 INFO Randomly Initializing VecFeatured Model... 2022-07-11 08:42:11,912 INFO tasks = 1p 2022-07-11 08:42:11,912 INFO init_step = 0 2022-07-11 08:42:11,912 INFO Training info: 2022-07-11 08:42:11,912 INFO 1p.-1p: infinite 2022-07-11 08:42:11,912 INFO Start Training... 2022-07-11 08:42:11,912 INFO learning_rate = 0 2022-07-11 08:42:11,912 INFO batch_size = 512 2022-07-11 08:42:11,912 INFO hidden_dim = 200 2022-07-11 08:42:11,912 INFO gamma = 10.000000 2022-07-11 08:42:11,913 INFO loading static entity+relation features from /gf3/home/zlz/data/knowledge_graphs/wikikg90m-v2/processed 2022-07-11 08:47:32,461 INFO [GPU 0] tasks: 1p.-1p overwritting args.save_path logging to ../logs/wikikg90m-v2/1p.-1p-1p/VecFeatured/g-10.0-mode-(feat-only-768,l2,)-adv-1.0-reg-1e-09-ngpu-0-os-(0,0,u,u,0,True,False)-dataset-(single,3000,e,True,before)-opt-(aggr,adagrad,cpu,False,5)-sharen-naive-lr_none/2022.07.11-08:42:01 r(e) r(e) step: 1000, t_read: 0.00140, t_fwd: 0.00535, t_loss: 0.00207, t_opt: 0.00091: 100%|██████████| 1001/1001 [00:11<00:00, 90.85it/s] pure virtual method called terminate called without an active exception /dat/zlz/anaconda3/envs/ZLZ/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 4 leaked semaphores to clean up at shutdown len(cache)) 2022-07-11 08:50:42,270 INFO Training finished!!

How to fix this error? Thanks!