Open sixteen23333 opened 1 year ago
Hello, I am reproducing your experimental results, environment configuration is cuda10.2, python version is 3.6.13, tensorflow-gpu=1.14.0, used mr dataset to build graph, and then performed training, running code as python build_graph.py mr 3 python train.py --dataset mr The process did not modify any parameters or code, and the results of the first 18 epochs are as follows: train start... Epoch: 0001 train_loss= 0.78109 train_acc= 0.50766 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.65325 Epoch: 0002 train_loss= 0.69768 train_acc= 0.51000 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.59264 Epoch: 0003 train_loss= 0.69219 train_acc= 0.51532 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.52535 Epoch: 0004 train_loss= 0.67984 train_acc= 0.57706 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.47986 Epoch: 0005 train_loss= 0.67484 train_acc= 0.62191 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.65245 Epoch: 0006 train_loss= 0.65726 train_acc= 0.65552 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.75031 Epoch: 0007 train_loss= 0.64138 train_acc= 0.65411 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.60905 Epoch: 0008 train_loss= 0.62076 train_acc= 0.66239 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.50486 Epoch: 0009 train_loss= 0.58148 train_acc= 0.68662 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.65528 Epoch: 0010 train_loss= 0.57630 train_acc= 0.70600 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.41573 Epoch: 0011 train_loss= 0.55253 train_acc= 0.71851 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.80978 Epoch: 0012 train_loss= 0.56237 train_acc= 0.70475 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.39144 Epoch: 0013 train_loss= 0.55267 train_acc= 0.71397 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.73540 Epoch: 0014 train_loss= 0.53143 train_acc= 0.73304 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.23638 Epoch: 0015 train_loss= 0.53013 train_acc= 0.73492 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.73155 Epoch: 0016 train_loss= 0.51571 train_acc= 0.74195 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.02690 Epoch: 0017 train_loss= 0.51938 train_acc= 0.74023 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.91224 Epoch: 0018 train_loss= 0.50715 train_acc= 0.74805 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.93634
I'm confused by the fact that val_loss is always nan, and val_acc and test_acc never change, Hope to get your answer
Hello, I am reproducing your experimental results, environment configuration is cuda10.2, python version is 3.6.13, tensorflow-gpu=1.14.0, used mr dataset to build graph, and then performed training, running code as python build_graph.py mr 3 python train.py --dataset mr The process did not modify any parameters or code, and the results of the first 18 epochs are as follows: train start... Epoch: 0001 train_loss= 0.78109 train_acc= 0.50766 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.65325 Epoch: 0002 train_loss= 0.69768 train_acc= 0.51000 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.59264 Epoch: 0003 train_loss= 0.69219 train_acc= 0.51532 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.52535 Epoch: 0004 train_loss= 0.67984 train_acc= 0.57706 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.47986 Epoch: 0005 train_loss= 0.67484 train_acc= 0.62191 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.65245 Epoch: 0006 train_loss= 0.65726 train_acc= 0.65552 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.75031 Epoch: 0007 train_loss= 0.64138 train_acc= 0.65411 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.60905 Epoch: 0008 train_loss= 0.62076 train_acc= 0.66239 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.50486 Epoch: 0009 train_loss= 0.58148 train_acc= 0.68662 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.65528 Epoch: 0010 train_loss= 0.57630 train_acc= 0.70600 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.41573 Epoch: 0011 train_loss= 0.55253 train_acc= 0.71851 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.80978 Epoch: 0012 train_loss= 0.56237 train_acc= 0.70475 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.39144 Epoch: 0013 train_loss= 0.55267 train_acc= 0.71397 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.73540 Epoch: 0014 train_loss= 0.53143 train_acc= 0.73304 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.23638 Epoch: 0015 train_loss= 0.53013 train_acc= 0.73492 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.73155 Epoch: 0016 train_loss= 0.51571 train_acc= 0.74195 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 4.02690 Epoch: 0017 train_loss= 0.51938 train_acc= 0.74023 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.91224 Epoch: 0018 train_loss= 0.50715 train_acc= 0.74805 val_loss= nan val_acc= 0.4929 6 test_acc= 0.50000 time= 3.93634
I'm confused by the fact that val_loss is always nan, and val_acc and test_acc never change, Hope to get your answer