Closed awavefunction closed 5 years ago
I find that the code is not deterministic, i.e. the results vary from run to run, probably due to well known problems with random seeds in PyTorch. I can obtain deterministic results if I set num_workers=1
and device=cpu
with PyTorch version 1.2. With these settings, I obtain:
TextCNN on RCV1, hierarchical=true
: F1 = 0.731
TextCNN on RCV1, hierarchical=false
: F1 = 0.738
TextRNN on RCV1, hierarchical=true
: F1 = 0.785
TextRNN on RCV1, hierarchical=false
: F1 = 0.786
These numbers should be reproducible by anyone else that uses num_workers=1
and device=cpu
. But the results still differ from the paper.
I find that the code is not deterministic, i.e. the results vary from run to run, probably due to well known problems with random seeds in PyTorch. I can obtain deterministic results if I set
num_workers=1
anddevice=cpu
with PyTorch version 1.2. With these settings, I obtain:TextCNN on RCV1,
hierarchical=true
: F1 = 0.731 TextCNN on RCV1,hierarchical=false
: F1 = 0.738 TextRNN on RCV1,hierarchical=true
: F1 = 0.785 TextRNN on RCV1,hierarchical=false
: F1 = 0.786These numbers should be reproducible by anyone else that uses
num_workers=1
anddevice=cpu
. But the results still differ from the paper. 1、The RCV1 dataset has different versions for different task, maybe you are using a different one. In this paper, the train set has 23149 instances, while test set has 781264 instances. 2、The best result of TextCNN is produced by using public 300dims pre-trained token-embedding(https://nlp.stanford.edu/projects/glove/) like other papers. you can try it .
@amulder are you able to verify the results from their paper ? We are facing the same problem of reproducing using TextCNN and TextRCNN.
Thank you for making this code open and available to the community. It is easy to use. My issue is that I do not obtain the same results that you present in Table 4 of your paper.
When I train the TextCNN model on the RCV1 data set using the parameters you provide in
config/train.json
, I obtain a micro F1 score of 0.739. When I sethierarchical=false
, I obtain a micro F1 score of 0.732. Your table shows a micro F1 score of 0.761 (hierarchical) and 0.737 (flat).Similarly, when I train the TextRNN model using the default configuration file with
model_name=TextRNN
, I find a micro F1 score of 0.793 with hierarchical loss, and a micro F1 score of 0.792 without hierarchical loss. Your table shows a micro F1 score of 0.789 (hierarchical) and 0.755 (flat).Are you able to directly reproduce Table 4 with the configuration in this repo, or is your configuration different (and if so, can you share it)?