Unable to save checkpoint with TPLinkerPlus

jarork commented 3 years ago

Description The checkpoints were saved successfully with my previous datasets and NYT_star, which contain thousands of entities and relations. However, last week when I tried to apply TPLinkerPlus to a new chinese dataset, which contains no relations and the lengths of all text are less than 20 chars; while the scores were being improved, no checkpoint files were found in the wandb folder.

My debugging

Initially I thought it was caused by wandb, then I moved to the default logger, but still no checkpoints were saved.
After that, I guessed the bug was caused by my dataset that contains no relations; therefore, I randomly added two or three relations into my training set, sadly it did nothing and I got no checkpoint files saved.
I switched the dataset to my previous ones, the checkpoints are normally saved as the performance is improved while training.

Training parameters "hyper_parameters": { "batch_size": 28, # 32 "epochs": 1000, "seed": 2333, "log_interval": 10, "max_seq_len": 80, # 128 "sliding_len": 20, # "scheduler": "CAWR", # Step "ghm": False, # set True if you want to use GHM to adjust the weights of gradients, this will speed up the training process and might improve the results. (Note that ghm in current version is unstable now, may hurt the results) "tok_pair_sample_rate": 1, # (0, 1] How many percent of token paris you want to sample for training, this would slow down the training if set to less than 1. It is only helpful when your GPU memory is not enought for the training. },

jarork commented 3 years ago

I notice that, you use relation f1 score as final score(valid_f1). Therefore, when valid_f1 is comparing to max_f1, ent_f1 is totally irrelavent, which is the reason why no checkpoints are saved while I'm working on my pure NER task. Why do you use rel_f1 as valid_f1? thanks

131250208 commented 3 years ago

Because this repository is mainly for relation extraction. If you want to use it for pure NER tasks, you have to change some codes.

jarork commented 3 years ago

Thank you!

131250208 / TPlinker-joint-extraction

Unable to save checkpoint with TPLinkerPlus #50