131250208 / TPlinker-joint-extraction

438 stars 94 forks source link

Where to modify the code for TPLinkerPlus in order to train and evaluate with mutiple GPUs on a local machine? #51

Open jarork opened 3 years ago

jarork commented 3 years ago

in Evaluation, I've tried to comment the line: os.environ["CUDA_VISIBLE_DEVICES"] = str(config["device_num"])

also, I add a new line: rel_extractor = nn.DataParallel(rel_extractor) before rel_extractor = rel_extractor.to(device)

the rest of code in Evaluation remains unchanged.

It returns this error: Traceback (most recent call last): File "evaluation.py", line 503, in rel_extractor.load_state_dict(torch.load(model_state_path)) File "/nlp_data/plb/conda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for DataParallel: Missing key(s) in state_dict: "module.encoder.embeddings.word_embeddings.weight", "module.encoder.embeddings.position_embeddings.weight", "module.encoder.embeddings.token_type_embeddings.weight", "module.encoder.embeddings.LayerNorm.weight", "module.encoder.embeddings.LayerNorm.bias", "module.encoder.encoder.layer.0.attention.self.query.weight", ......

Do you have any suggestion how I can solve this? Thanks

131250208 commented 3 years ago

There might be some problems with the keys if you use multiple GPUs. You have to check whether the mentioned keys are in state_dict. If not, find the most similar ones and try to add some codes to fix the key errors.