Closed wzhouad closed 5 years ago
Hi, May I ask what is your running environment (including hardware and software)? Under different environments, the performance of this model varies. If the environment is the same as what we describe in the Readme.md file, the current default settings should give you the model exactly the same as the pretrained model we released here.
Hi, My python version is 3.6.5, pytorch is 1.1, CUDA is 10.0. I'm using GTX 1080Ti. I trained another 5 times, and the mean F1 is around 67.5% (+- 0.3%). I fully understand that the software and hardware will lead to different performance, but didn't expect so large difference. Also, could you tell me the mean and std of F1 score in your experiments? It's important for measuring the stability of the model and a concrete comparison to other methods.
Sorry for the late reply.
Yes, I do test the model under similar settings as yours. It seems that the loss is different from the first epoch (1.254588 v.s. 1.24539). These minor differences will start to accumulate, which eventually lead to a different model (around 67.5%). For now, we couldn't figure it out the reason behind this. For the model stats, I will update you later, since I am kind of occupied by the visa stuff...
For the mean and std of F1 score in my experiments, the stats is 68.2% +- 0.5%. Thank you for pointing out this issue! We deeply appreciate that.
Also, we will update this score on our paper, for a fair an concrete comparison to other methods.
Hi, I have run the training as well and get similar results as reported by @wzhouad :
Final Score: Precision (micro): 70.780% Recall (micro): 63.308% F1 (micro): 66.836%
OS: openSUSE Leap 15.0. GPUs: RTX 2080 Ti cuda verison: 10 Python version: Python 3.6.8
Package Version
certifi 2019.6.16
cffi 1.12.3
mkl-fft 1.0.12
mkl-random 1.0.2
numpy 1.16.4
pip 19.1.1
pycparser 2.19
setuptools 41.0.1
torch 1.1.0
tqdm 4.32.2
wheel 0.33.4
Hi @marchbnr ,
As stated in the Readme of this repo, we can't guarantee the performance of this repo when you run it under totally different settings (software and hardware). We also released the training log and pre-trained model.
For now, we might not able to find out the cause of this issue, since it involved too many variables (versions of GPU, CUDA, pytorch, etc,.)
Hi, I retrain the model with 5 different random seeds on TACRED. However, the average F1 score is 67.116(+-0.121), which is much lower than the reported score in your paper. Is the default model config correct? Also, how large is the std in your experiments?