WikiChao / Ego-AV-Loc

[CVPR 2023] Egocentric Audio-Visual Object Localization
23 stars 0 forks source link

Checkpoint and traing details #3

Open CleyLyChen opened 9 months ago

CleyLyChen commented 9 months ago

Hi! Great work! Hello! I am learning about your work, but I cannot achieve the performance mentioned in your paper. I noticed that the gt_mask in NetWrapper is always a tensor with all elements set to 1. Could you provide more training details and your model checkpoint?

WikiChao commented 9 months ago

Hi,

gt_mask is used for training the separation branch. Usually, we have two ways to set gt_masks: (1) binary mask, where the gt_masks represent whether certain Time-Frequency bins are dominated by one of the sounds or the mixture, referred to as gt_masks = (spec_clean > 0.5 * spec).float(). The values should not be all one, unless you choose spec_clean and spec to be the same or the added sound is much smaller than the original one. Besides, you can try different loss terms for penalizing, such as binary loss or L1/L2 loss; (2) ratio mask is also commonly used, with the form of gt_masks = spec_clean / spec. The same loss terms are applied.

Please check if you have mixed another sound at the beginning, if you don't need the separation branch, consider removing the loss term.

CleyLyChen commented 9 months ago

Hi, I removed the separation branch and loss term, and train the model on Epic-Kitchens-100, but its CiOU@0.2 is only 15.2, CiOU@0.3 is 7 and CiOU@0.4 is 1.8, i dont know why, can you share more details?

WikiChao commented 9 months ago

I uploaded a data filtering script, to help remove some silent video clips in the training set. As the training set is constructed using the action recognition benchmark, there are many silent videos, which yield a negative impact on the training process. We tried a simple way to remove some silent videos (please refer to the code), but there could be better ways to explore.

Also, early stopping might be a useful trick, as the model uses a pre-trained vision network as initialization. You can check results from early epochs/steps, to decide whether your model's training is on the right way.

CleyLyChen commented 9 months ago

Hi, I still have some questions during training.

  1. I removed the slient video clips according to your scripts, training set is only 9411 video clips left and testing set is only 2898 video clips left.
  2. I trained your model for 40 epochs on filtered training set, the best epoch's ciou@0.2 is 18.26, ciou@0.3 is 9.21, ciou@0.4 is 3.86(threshold is median of heatmap), here is my training log, the batch size is 16 and i removed the err.mean() during training.
    Epoch: [1][0/409], Time: 9.45, Data: 5.52, lr_sound: 0.001, lr_frame: 0.0001, loss: 4.1758
    Epoch: [1][20/409], Time: 3.75, Data: 0.32, lr_sound: 0.001, lr_frame: 0.0001, loss: 3.4631
    Epoch: [1][40/409], Time: 2.91, Data: 0.19, lr_sound: 0.001, lr_frame: 0.0001, loss: 2.5754
    Epoch: [1][60/409], Time: 2.83, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 2.0339
    Epoch: [1][80/409], Time: 3.04, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.8445
    Epoch: [1][100/409], Time: 3.16, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.6860
    Epoch: [1][0/588], Time: 11.46, Data: 7.43, lr_sound: 0.001, lr_frame: 0.0001, loss: 4.1810
    Epoch: [1][20/588], Time: 3.89, Data: 0.41, lr_sound: 0.001, lr_frame: 0.0001, loss: 3.4935
    Epoch: [1][40/588], Time: 3.69, Data: 0.24, lr_sound: 0.001, lr_frame: 0.0001, loss: 2.5676
    Epoch: [1][60/588], Time: 3.63, Data: 0.18, lr_sound: 0.001, lr_frame: 0.0001, loss: 2.2132
    Epoch: [1][80/588], Time: 3.60, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.7622
    Epoch: [1][100/588], Time: 3.57, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.6586
    Epoch: [1][120/588], Time: 3.56, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.4217
    Epoch: [1][140/588], Time: 3.55, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.4287
    Epoch: [1][160/588], Time: 3.55, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.3390
    Epoch: [1][180/588], Time: 3.55, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.1895
    Epoch: [1][200/588], Time: 3.54, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.1199
    Epoch: [1][220/588], Time: 3.53, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.8974
    Epoch: [1][240/588], Time: 3.53, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.0010
    Epoch: [1][260/588], Time: 3.52, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.0835
    Epoch: [1][280/588], Time: 3.52, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.1133
    Epoch: [1][300/588], Time: 3.53, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.9576
    Epoch: [1][320/588], Time: 3.52, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.8214
    Epoch: [1][340/588], Time: 3.52, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.9479
    Epoch: [1][360/588], Time: 3.52, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.8073
    Epoch: [1][380/588], Time: 3.52, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7141
    Epoch: [1][400/588], Time: 3.52, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.8135
    Epoch: [1][420/588], Time: 3.52, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7217
    Epoch: [1][440/588], Time: 3.52, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6688
    Epoch: [1][460/588], Time: 3.52, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6411
    Epoch: [1][480/588], Time: 3.52, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6243
    Epoch: [1][500/588], Time: 3.52, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6810
    Epoch: [1][520/588], Time: 3.52, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5658
    Epoch: [1][540/588], Time: 3.52, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7880
    Epoch: [1][560/588], Time: 3.52, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5949
    Epoch: [1][580/588], Time: 3.51, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7155
    Epoch: [1][0/588], Time: 11.39, Data: 5.37, lr_sound: 0.001, lr_frame: 0.0001, loss: 4.1810
    Epoch: [1][20/588], Time: 4.09, Data: 0.31, lr_sound: 0.001, lr_frame: 0.0001, loss: 3.4920
    Epoch: [1][40/588], Time: 3.97, Data: 0.19, lr_sound: 0.001, lr_frame: 0.0001, loss: 2.5649
    Epoch: [1][60/588], Time: 3.88, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 2.1807
    Epoch: [1][80/588], Time: 3.83, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.7942
    Epoch: [1][100/588], Time: 3.79, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.5984
    Epoch: [1][120/588], Time: 3.81, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.5884
    Epoch: [1][140/588], Time: 3.79, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.4185
    Epoch: [1][160/588], Time: 3.78, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.2574
    Epoch: [1][180/588], Time: 3.77, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.3440
    Epoch: [1][200/588], Time: 3.77, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.2705
    Epoch: [1][220/588], Time: 3.76, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.0430
    Epoch: [1][240/588], Time: 3.75, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.1693
    Epoch: [1][260/588], Time: 3.75, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.9548
    Epoch: [1][280/588], Time: 3.76, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.8652
    Epoch: [1][300/588], Time: 3.75, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.9799
    Epoch: [1][320/588], Time: 3.75, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.9064
    Epoch: [1][340/588], Time: 3.76, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.9438
    Epoch: [1][360/588], Time: 3.76, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.8222
    Epoch: [1][380/588], Time: 3.76, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7039
    Epoch: [1][400/588], Time: 3.77, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6734
    Epoch: [1][420/588], Time: 3.77, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7620
    Epoch: [1][440/588], Time: 3.77, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6265
    Epoch: [1][460/588], Time: 3.77, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7539
    Epoch: [1][480/588], Time: 3.77, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6029
    Epoch: [1][500/588], Time: 3.77, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6748
    Epoch: [1][520/588], Time: 3.77, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5724
    Epoch: [1][540/588], Time: 3.78, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5763
    Epoch: [1][560/588], Time: 3.77, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5508
    Epoch: [1][580/588], Time: 3.77, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6539
    Epoch: [2][0/588], Time: 10.45, Data: 6.30, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5689
    Epoch: [2][20/588], Time: 4.00, Data: 0.36, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5990
    Epoch: [2][40/588], Time: 3.88, Data: 0.21, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5450
    Epoch: [2][60/588], Time: 3.83, Data: 0.16, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5504
    Epoch: [2][80/588], Time: 3.79, Data: 0.14, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5666
    Epoch: [2][100/588], Time: 3.77, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6176
    Epoch: [2][120/588], Time: 3.76, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6018
    Epoch: [2][140/588], Time: 3.75, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5471
    Epoch: [2][160/588], Time: 3.74, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5686
    Epoch: [2][180/588], Time: 3.74, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6198
    Epoch: [2][200/588], Time: 3.74, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.0275
    Epoch: [2][220/588], Time: 3.74, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7355
    Epoch: [2][240/588], Time: 3.74, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7617
    Epoch: [2][260/588], Time: 3.73, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.9432
    Epoch: [2][280/588], Time: 3.73, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6631
    Epoch: [2][300/588], Time: 3.73, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6630
    Epoch: [2][320/588], Time: 3.73, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7054
    Epoch: [2][340/588], Time: 3.72, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5094
    Epoch: [2][360/588], Time: 3.73, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6511
    Epoch: [2][380/588], Time: 3.73, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7991
    Epoch: [2][400/588], Time: 3.73, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5536
    Epoch: [2][420/588], Time: 3.73, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4725
    Epoch: [2][440/588], Time: 3.73, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5614
    Epoch: [2][460/588], Time: 3.73, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5916
    Epoch: [2][480/588], Time: 3.73, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4753
    Epoch: [2][500/588], Time: 3.73, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5111
    Epoch: [2][520/588], Time: 3.74, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5404
    Epoch: [2][540/588], Time: 3.74, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5138
    Epoch: [2][560/588], Time: 3.74, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5061
    Epoch: [2][580/588], Time: 3.74, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4305
    Epoch: [3][0/588], Time: 12.32, Data: 7.20, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5470
    Epoch: [3][20/588], Time: 4.15, Data: 0.40, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5444
    Epoch: [3][40/588], Time: 3.92, Data: 0.24, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4655
    Epoch: [3][60/588], Time: 3.83, Data: 0.18, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6225
    Epoch: [3][80/588], Time: 3.36, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4974
    Epoch: [3][100/588], Time: 3.08, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7689
    Epoch: [3][120/588], Time: 3.17, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5072
    Epoch: [3][140/588], Time: 3.25, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4875
    Epoch: [3][160/588], Time: 3.30, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5146
    Epoch: [3][180/588], Time: 3.34, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5556
    Epoch: [3][200/588], Time: 3.36, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4513
    Epoch: [3][220/588], Time: 3.39, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4138
    Epoch: [3][240/588], Time: 3.42, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6241
    Epoch: [3][260/588], Time: 3.43, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5607
    Epoch: [3][280/588], Time: 3.46, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6141
    Epoch: [3][300/588], Time: 3.47, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5482
    Epoch: [3][320/588], Time: 3.48, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4780
    Epoch: [3][340/588], Time: 3.50, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4840
    Epoch: [3][360/588], Time: 3.51, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4358
    Epoch: [3][380/588], Time: 3.52, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5462
    Epoch: [3][400/588], Time: 3.52, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5667
    Epoch: [3][420/588], Time: 3.53, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6208
    Epoch: [3][440/588], Time: 3.53, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5835
    Epoch: [3][460/588], Time: 3.54, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4604
    Epoch: [3][480/588], Time: 3.55, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4659
    Epoch: [3][500/588], Time: 3.55, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4573
    Epoch: [3][520/588], Time: 3.55, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6221
    Epoch: [3][540/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4710
    Epoch: [3][560/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4532
    Epoch: [3][580/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4506
    Epoch: [4][0/588], Time: 10.55, Data: 6.33, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4392
    Epoch: [4][20/588], Time: 3.96, Data: 0.36, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4195
    Epoch: [4][40/588], Time: 3.80, Data: 0.22, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4902
    Epoch: [1][0/588], Time: 13.00, Data: 7.71, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4899
    Epoch: [1][20/588], Time: 4.06, Data: 0.43, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5651
    Epoch: [1][40/588], Time: 3.80, Data: 0.25, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5583
    Epoch: [1][60/588], Time: 3.71, Data: 0.19, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4681
    Epoch: [1][80/588], Time: 3.68, Data: 0.16, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4563
    Epoch: [1][100/588], Time: 3.67, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4984
    Epoch: [1][120/588], Time: 3.65, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5317
    Epoch: [1][140/588], Time: 3.64, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5651
    Epoch: [1][160/588], Time: 3.62, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4764
    Epoch: [1][180/588], Time: 3.62, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6429
    Epoch: [1][200/588], Time: 3.61, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5813
    Epoch: [1][220/588], Time: 3.61, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4645
    Epoch: [1][240/588], Time: 3.61, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5310
    Epoch: [1][260/588], Time: 3.60, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4372
    Epoch: [1][280/588], Time: 3.60, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4687
    Epoch: [1][300/588], Time: 3.60, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4925
    Epoch: [1][320/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4976
    Epoch: [1][340/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4829
    Epoch: [1][360/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5029
    Epoch: [1][380/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4449
    Epoch: [1][400/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5103
    Epoch: [1][420/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5083
    Epoch: [1][440/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5028
    Epoch: [1][460/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4942
    Epoch: [1][480/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5026
    Epoch: [1][500/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4861
    Epoch: [1][520/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3890
    Epoch: [1][540/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6259
    Epoch: [1][560/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4007
    Epoch: [1][580/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4087
    Epoch: [2][0/588], Time: 10.37, Data: 6.14, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3901
    Epoch: [2][20/588], Time: 3.90, Data: 0.35, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5464
    Epoch: [2][40/588], Time: 3.75, Data: 0.21, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4392
    Epoch: [2][60/588], Time: 3.70, Data: 0.16, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4178
    Epoch: [2][80/588], Time: 3.68, Data: 0.14, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4418
    Epoch: [2][100/588], Time: 3.65, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4693
    Epoch: [2][120/588], Time: 3.63, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4793
    Epoch: [2][140/588], Time: 3.63, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5529
    Epoch: [2][160/588], Time: 3.62, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4476
    Epoch: [2][180/588], Time: 3.62, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5183
    Epoch: [2][200/588], Time: 3.61, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3991
    Epoch: [2][220/588], Time: 3.61, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5588
    Epoch: [2][240/588], Time: 3.61, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5231
    Epoch: [2][260/588], Time: 3.61, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4619
    Epoch: [2][280/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4306
    Epoch: [2][300/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4238
    Epoch: [2][320/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4297
    Epoch: [2][340/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4767
    Epoch: [2][360/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3697
    Epoch: [2][380/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6044
    Epoch: [2][400/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4399
    Epoch: [2][420/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4914
    Epoch: [2][440/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4851
    Epoch: [2][460/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5587
    Epoch: [2][480/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5352
    Epoch: [2][500/588], Time: 3.52, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4812
    Epoch: [2][520/588], Time: 3.48, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5102
    Epoch: [2][540/588], Time: 3.48, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4389
    Epoch: [2][560/588], Time: 3.49, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5080
    Epoch: [2][580/588], Time: 3.49, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5662
    Epoch: [3][0/588], Time: 11.09, Data: 5.81, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.9355
    Epoch: [3][20/588], Time: 4.02, Data: 0.34, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5677
    Epoch: [3][40/588], Time: 3.82, Data: 0.20, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7611
    Epoch: [3][60/588], Time: 3.73, Data: 0.16, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7395
    Epoch: [3][80/588], Time: 3.70, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7692
    Epoch: [3][100/588], Time: 3.68, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5468
    Epoch: [3][120/588], Time: 3.66, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5703
    Epoch: [3][140/588], Time: 3.66, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7253
    Epoch: [3][160/588], Time: 3.64, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6027
    Epoch: [3][180/588], Time: 3.64, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5449
    Epoch: [3][200/588], Time: 3.63, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6291
    Epoch: [3][220/588], Time: 3.62, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4519
    Epoch: [3][240/588], Time: 3.62, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6092
    Epoch: [3][260/588], Time: 3.62, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5218
    Epoch: [3][280/588], Time: 3.62, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4292
    Epoch: [3][300/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5298
    Epoch: [3][320/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4253
    Epoch: [3][340/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5528
    Epoch: [3][360/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5641
    Epoch: [3][380/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5031
    Epoch: [3][400/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4240
    Epoch: [3][420/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6612
    Epoch: [3][440/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5816
    Epoch: [3][460/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5122
    Epoch: [3][480/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4329
    Epoch: [3][500/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5064
    Epoch: [3][520/588], Time: 3.60, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5358
    Epoch: [3][540/588], Time: 3.60, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5695
    Epoch: [3][560/588], Time: 3.60, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4869
    Epoch: [3][580/588], Time: 3.60, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3943
    Epoch: [4][0/588], Time: 11.69, Data: 7.50, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5344
    Epoch: [4][20/588], Time: 4.11, Data: 0.47, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4860
    Epoch: [4][40/588], Time: 3.85, Data: 0.27, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4284
    Epoch: [4][60/588], Time: 3.77, Data: 0.20, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4747
    Epoch: [4][80/588], Time: 3.72, Data: 0.17, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6278
    Epoch: [4][100/588], Time: 3.69, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4910
    Epoch: [4][120/588], Time: 3.69, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4600
    Epoch: [4][140/588], Time: 3.70, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.9107
    Epoch: [4][160/588], Time: 3.68, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6278
    Epoch: [4][180/588], Time: 3.67, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6042
    Epoch: [4][200/588], Time: 3.66, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5486
    Epoch: [4][220/588], Time: 3.66, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4943
    Epoch: [4][240/588], Time: 3.65, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4674
    Epoch: [4][260/588], Time: 3.64, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4376
    Epoch: [4][280/588], Time: 3.64, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5288
    Epoch: [4][300/588], Time: 3.63, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5231
    Epoch: [4][320/588], Time: 3.63, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.8455
    Epoch: [4][340/588], Time: 3.62, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.9007
    Epoch: [4][360/588], Time: 3.61, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 1.3321
    Epoch: [4][380/588], Time: 3.61, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7563
    Epoch: [4][400/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7966
    Epoch: [4][420/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5455
    Epoch: [4][440/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7893
    Epoch: [4][460/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.8344
    Epoch: [4][480/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6783
    Epoch: [4][500/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6623
    Epoch: [4][520/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5764
    Epoch: [4][540/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4946
    Epoch: [4][560/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7310
    Epoch: [4][580/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5261
    Epoch: [5][0/588], Time: 12.67, Data: 8.52, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5352
    Epoch: [5][20/588], Time: 3.98, Data: 0.47, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5590
    Epoch: [5][40/588], Time: 3.83, Data: 0.27, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4754
    Epoch: [5][60/588], Time: 3.77, Data: 0.20, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7409
    Epoch: [5][80/588], Time: 3.72, Data: 0.17, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6370
    Epoch: [5][100/588], Time: 3.68, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7033
    Epoch: [5][120/588], Time: 3.65, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7528
    Epoch: [5][140/588], Time: 3.64, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5645
    Epoch: [5][160/588], Time: 3.63, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5184
    Epoch: [5][180/588], Time: 3.62, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4757
    Epoch: [5][200/588], Time: 3.61, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4939
    Epoch: [5][220/588], Time: 3.61, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5752
    Epoch: [5][240/588], Time: 3.60, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6898
    Epoch: [5][260/588], Time: 3.60, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7349
    Epoch: [5][280/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5455
    Epoch: [5][300/588], Time: 3.58, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3914
    Epoch: [5][320/588], Time: 3.58, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4838
    Epoch: [5][340/588], Time: 3.58, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5296
    Epoch: [5][360/588], Time: 3.58, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4718
    Epoch: [5][380/588], Time: 3.56, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4364
    Epoch: [5][400/588], Time: 3.50, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4866
    Epoch: [5][420/588], Time: 3.45, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4586
    Epoch: [5][440/588], Time: 3.46, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4475
    Epoch: [5][460/588], Time: 3.46, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4107
    Epoch: [5][480/588], Time: 3.47, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4221
    Epoch: [5][500/588], Time: 3.47, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4310
    Epoch: [5][520/588], Time: 3.48, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4239
    Epoch: [5][540/588], Time: 3.48, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4278
    Epoch: [5][560/588], Time: 3.48, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3951
    Epoch: [5][580/588], Time: 3.48, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4610
    Epoch: [6][0/588], Time: 11.59, Data: 6.73, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5101
    Epoch: [6][20/588], Time: 3.92, Data: 0.38, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3706
    Epoch: [6][40/588], Time: 3.75, Data: 0.23, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4121
    Epoch: [6][60/588], Time: 3.68, Data: 0.17, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4006
    Epoch: [6][80/588], Time: 3.65, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4331
    Epoch: [6][100/588], Time: 3.63, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3715
    Epoch: [6][120/588], Time: 3.67, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4309
    Epoch: [6][140/588], Time: 3.65, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4765
    Epoch: [6][160/588], Time: 3.64, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4571
    Epoch: [6][180/588], Time: 3.63, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4732
    Epoch: [6][200/588], Time: 3.63, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4529
    Epoch: [6][220/588], Time: 3.62, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4397
    Epoch: [6][240/588], Time: 3.62, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4313
    Epoch: [6][260/588], Time: 3.62, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4684
    Epoch: [6][280/588], Time: 3.61, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4346
    Epoch: [6][300/588], Time: 3.61, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4327
    Epoch: [6][320/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3990
    Epoch: [6][340/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4235
    Epoch: [6][360/588], Time: 3.62, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6646
    Epoch: [6][380/588], Time: 3.63, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4443
    Epoch: [6][400/588], Time: 3.65, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4655
    Epoch: [6][420/588], Time: 3.66, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4172
    Epoch: [6][440/588], Time: 3.66, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4796
    Epoch: [6][460/588], Time: 3.67, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4421
    Epoch: [6][480/588], Time: 3.67, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4276
    Epoch: [6][500/588], Time: 3.69, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5559
    Epoch: [6][520/588], Time: 3.69, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4342
    Epoch: [6][540/588], Time: 3.69, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4225
    Epoch: [6][560/588], Time: 3.70, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4076
    Epoch: [6][580/588], Time: 3.70, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4301
    Epoch: [7][0/588], Time: 11.20, Data: 6.86, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4106
    Epoch: [7][20/588], Time: 3.93, Data: 0.39, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3630
    Epoch: [7][40/588], Time: 3.75, Data: 0.23, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5152
    Epoch: [7][60/588], Time: 3.69, Data: 0.17, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4418
    Epoch: [7][80/588], Time: 3.66, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4756
    Epoch: [7][100/588], Time: 3.64, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4041
    Epoch: [7][120/588], Time: 3.63, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4156
    Epoch: [7][140/588], Time: 3.62, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5060
    Epoch: [7][160/588], Time: 3.62, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4319
    Epoch: [7][180/588], Time: 3.61, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3917
    Epoch: [7][200/588], Time: 3.61, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5006
    Epoch: [7][220/588], Time: 3.60, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4999
    Epoch: [7][240/588], Time: 3.60, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4131
    Epoch: [7][260/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4138
    Epoch: [7][280/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4626
    Epoch: [7][300/588], Time: 3.58, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4632
    Epoch: [7][320/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4477
    Epoch: [7][340/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4730
    Epoch: [7][360/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4873
    Epoch: [7][380/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5394
    Epoch: [7][400/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4647
    Epoch: [7][420/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6635
    Epoch: [7][440/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6654
    Epoch: [7][460/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4503
    Epoch: [7][480/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3930
    Epoch: [7][500/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4378
    Epoch: [7][520/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4890
    Epoch: [7][540/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4594
    Epoch: [7][560/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3856
    Epoch: [7][580/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3852
    Epoch: [8][0/588], Time: 11.22, Data: 6.87, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3853
    Epoch: [8][20/588], Time: 3.87, Data: 0.39, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4320
    Epoch: [8][40/588], Time: 3.73, Data: 0.23, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3722
    Epoch: [8][60/588], Time: 3.67, Data: 0.17, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4052
    Epoch: [8][80/588], Time: 3.65, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4073
    Epoch: [8][100/588], Time: 3.63, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5277
    Epoch: [8][120/588], Time: 3.62, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5635
    Epoch: [8][140/588], Time: 3.61, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4964
    Epoch: [8][160/588], Time: 3.60, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4690
    Epoch: [8][180/588], Time: 3.59, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4031
    Epoch: [8][200/588], Time: 3.59, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5540
    Epoch: [8][220/588], Time: 3.58, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4569
    Epoch: [8][240/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4940
    Epoch: [8][260/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4009
    Epoch: [8][280/588], Time: 3.47, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4326
    Epoch: [8][300/588], Time: 3.37, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4601
    Epoch: [8][320/588], Time: 3.39, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.8279
    Epoch: [8][340/588], Time: 3.40, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4314
    Epoch: [8][360/588], Time: 3.41, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3750
    Epoch: [8][380/588], Time: 3.42, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3666
    Epoch: [8][400/588], Time: 3.43, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4125
    Epoch: [8][420/588], Time: 3.44, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5684
    Epoch: [8][440/588], Time: 3.45, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3692
    Epoch: [8][460/588], Time: 3.46, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3926
    Epoch: [8][480/588], Time: 3.46, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3957
    Epoch: [8][500/588], Time: 3.47, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4023
    Epoch: [8][520/588], Time: 3.47, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3889
    Epoch: [8][540/588], Time: 3.48, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3953
    Epoch: [8][560/588], Time: 3.49, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3985
    Epoch: [8][580/588], Time: 3.49, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3741
    Epoch: [9][0/588], Time: 10.07, Data: 5.99, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4050
    Epoch: [9][20/588], Time: 3.91, Data: 0.34, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4040
    Epoch: [9][40/588], Time: 3.73, Data: 0.21, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5323
    Epoch: [9][60/588], Time: 3.68, Data: 0.16, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3872
    Epoch: [9][80/588], Time: 3.65, Data: 0.14, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4716
    Epoch: [9][100/588], Time: 3.63, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4690
    Epoch: [9][120/588], Time: 3.62, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4394
    Epoch: [9][140/588], Time: 3.61, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4139
    Epoch: [9][160/588], Time: 3.60, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6878
    Epoch: [9][180/588], Time: 3.60, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3903
    Epoch: [9][200/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4059
    Epoch: [9][220/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5559
    Epoch: [9][240/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4897
    Epoch: [9][260/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4441
    Epoch: [9][280/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4418
    Epoch: [9][300/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3837
    Epoch: [9][320/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4273
    Epoch: [9][340/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4181
    Epoch: [9][360/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3575
    Epoch: [9][380/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4261
    Epoch: [9][400/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3854
    Epoch: [9][420/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3686
    Epoch: [9][440/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4349
    Epoch: [9][460/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4734
    Epoch: [9][480/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4291
    Epoch: [9][500/588], Time: 3.59, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4492
    Epoch: [9][520/588], Time: 3.59, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4654
    Epoch: [9][540/588], Time: 3.60, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5038
    Epoch: [9][560/588], Time: 3.60, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7294
    Epoch: [9][580/588], Time: 3.60, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5328
    Epoch: [10][0/588], Time: 11.46, Data: 6.93, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5151
    Epoch: [10][20/588], Time: 3.90, Data: 0.39, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4711
    Epoch: [10][40/588], Time: 3.73, Data: 0.23, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4196
    Epoch: [10][60/588], Time: 3.65, Data: 0.18, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4256
    Epoch: [10][80/588], Time: 3.63, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3736
    Epoch: [10][100/588], Time: 3.60, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4482
    Epoch: [10][120/588], Time: 3.59, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4045
    Epoch: [10][140/588], Time: 3.58, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4529
    Epoch: [10][160/588], Time: 3.58, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4945
    Epoch: [10][180/588], Time: 3.57, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5045
    Epoch: [10][200/588], Time: 3.56, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4887
    Epoch: [10][220/588], Time: 3.56, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4134
    Epoch: [10][240/588], Time: 3.56, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3941
    Epoch: [10][260/588], Time: 3.56, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4449
    Epoch: [10][280/588], Time: 3.56, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4079
    Epoch: [10][300/588], Time: 3.56, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4288
    Epoch: [10][320/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3969
    Epoch: [10][340/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4863
    Epoch: [10][360/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4354
    Epoch: [10][380/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3979
    Epoch: [10][400/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3692
    Epoch: [10][420/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3650
    Epoch: [10][440/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4141
    Epoch: [10][460/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4216
    Epoch: [10][480/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4018
    Epoch: [10][500/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4596
    Epoch: [10][520/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3915
    Epoch: [10][540/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4634
    Epoch: [10][560/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4291
    Epoch: [10][580/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4550
    Epoch: [11][0/588], Time: 11.03, Data: 6.77, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4259
    Epoch: [11][20/588], Time: 3.93, Data: 0.38, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4890
    Epoch: [11][40/588], Time: 3.73, Data: 0.23, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4168
    Epoch: [11][60/588], Time: 3.66, Data: 0.17, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4504
    Epoch: [11][80/588], Time: 3.63, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.8193
    Epoch: [11][100/588], Time: 3.62, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6823
    Epoch: [11][120/588], Time: 3.60, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7105
    Epoch: [11][140/588], Time: 3.59, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7664
    Epoch: [11][160/588], Time: 3.58, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7094
    Epoch: [11][180/588], Time: 3.43, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5034
    Epoch: [11][200/588], Time: 3.29, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4624
    Epoch: [11][220/588], Time: 3.31, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4042
    Epoch: [11][240/588], Time: 3.34, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4923
    Epoch: [11][260/588], Time: 3.37, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4804
    Epoch: [11][280/588], Time: 3.39, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5170
    Epoch: [11][300/588], Time: 3.40, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5953
    Epoch: [11][320/588], Time: 3.40, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4636
    Epoch: [11][340/588], Time: 3.41, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4575
    Epoch: [11][360/588], Time: 3.42, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4535
    Epoch: [11][380/588], Time: 3.42, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4343
    Epoch: [11][400/588], Time: 3.43, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4597
    Epoch: [11][420/588], Time: 3.43, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5107
    Epoch: [11][440/588], Time: 3.44, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5243
    Epoch: [11][460/588], Time: 3.45, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5246
    Epoch: [11][480/588], Time: 3.45, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4542
    Epoch: [11][500/588], Time: 3.45, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4658
    Epoch: [11][520/588], Time: 3.46, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4083
    Epoch: [11][540/588], Time: 3.46, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4304
    Epoch: [11][560/588], Time: 3.46, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4246
    Epoch: [11][580/588], Time: 3.46, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4761
    Epoch: [12][0/588], Time: 12.45, Data: 8.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4787
    Epoch: [12][20/588], Time: 3.99, Data: 0.45, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5639
    Epoch: [12][40/588], Time: 3.78, Data: 0.26, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5103
    Epoch: [12][60/588], Time: 3.72, Data: 0.20, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4421
    Epoch: [12][80/588], Time: 3.67, Data: 0.16, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5372
    Epoch: [12][100/588], Time: 3.65, Data: 0.14, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4111
    Epoch: [12][120/588], Time: 3.64, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3956
    Epoch: [12][140/588], Time: 3.63, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4257
    Epoch: [12][160/588], Time: 3.62, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5525
    Epoch: [12][180/588], Time: 3.62, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4195
    Epoch: [12][200/588], Time: 3.62, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4871
    Epoch: [12][220/588], Time: 3.61, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4442
    Epoch: [12][240/588], Time: 3.60, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4616
    Epoch: [12][260/588], Time: 3.60, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3710
    Epoch: [12][280/588], Time: 3.60, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3933
    Epoch: [12][300/588], Time: 3.60, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5779
    Epoch: [12][320/588], Time: 3.60, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4477
    Epoch: [12][340/588], Time: 3.60, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4365
    Epoch: [12][360/588], Time: 3.61, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4012
    Epoch: [12][380/588], Time: 3.62, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4233
    Epoch: [12][400/588], Time: 3.63, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4949
    Epoch: [12][420/588], Time: 3.63, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4560
    Epoch: [12][440/588], Time: 3.64, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4273
    Epoch: [12][460/588], Time: 3.64, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3979
    Epoch: [12][480/588], Time: 3.65, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5277
    Epoch: [12][500/588], Time: 3.65, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3372
    Epoch: [12][520/588], Time: 3.66, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4502
    Epoch: [12][540/588], Time: 3.66, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5195
    Epoch: [12][560/588], Time: 3.66, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4047
    Epoch: [12][580/588], Time: 3.67, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4309
    Epoch: [13][0/588], Time: 12.70, Data: 8.76, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4487
    Epoch: [13][20/588], Time: 4.00, Data: 0.48, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5673
    Epoch: [13][40/588], Time: 3.78, Data: 0.27, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3961
    Epoch: [13][60/588], Time: 3.71, Data: 0.20, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4121
    Epoch: [13][80/588], Time: 3.67, Data: 0.17, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3774
    Epoch: [13][100/588], Time: 3.65, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4860
    Epoch: [13][120/588], Time: 3.63, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3864
    Epoch: [13][140/588], Time: 3.62, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3846
    Epoch: [13][160/588], Time: 3.62, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3744
    Epoch: [13][180/588], Time: 3.61, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5601
    Epoch: [13][200/588], Time: 3.61, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3918
    Epoch: [13][220/588], Time: 3.61, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3789
    Epoch: [13][240/588], Time: 3.62, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4206
    Epoch: [13][260/588], Time: 3.62, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3457
    Epoch: [13][280/588], Time: 3.63, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3727
    Epoch: [13][300/588], Time: 3.64, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3871
    Epoch: [13][320/588], Time: 3.65, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3638
    Epoch: [13][340/588], Time: 3.66, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6451
    Epoch: [13][360/588], Time: 3.66, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4421
    Epoch: [13][380/588], Time: 3.67, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4236
    Epoch: [13][400/588], Time: 3.68, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4055
    Epoch: [13][420/588], Time: 3.68, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4085
    Epoch: [13][440/588], Time: 3.69, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4245
    Epoch: [13][460/588], Time: 3.69, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4899
    Epoch: [13][480/588], Time: 3.70, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6591
    Epoch: [13][500/588], Time: 3.71, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.8647
    Epoch: [13][520/588], Time: 3.71, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5140
    Epoch: [13][540/588], Time: 3.71, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4103
    Epoch: [13][560/588], Time: 3.72, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4341
    Epoch: [13][580/588], Time: 3.72, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4560
    Epoch: [14][0/588], Time: 10.88, Data: 6.25, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4225
    Epoch: [14][20/588], Time: 3.86, Data: 0.36, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4947
    Epoch: [14][40/588], Time: 3.16, Data: 0.21, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3784
    Epoch: [14][60/588], Time: 2.84, Data: 0.16, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3654
    Epoch: [14][80/588], Time: 2.95, Data: 0.14, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4092
    Epoch: [14][100/588], Time: 3.07, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4060
    Epoch: [14][120/588], Time: 3.15, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4331
    Epoch: [14][140/588], Time: 3.21, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3924
    Epoch: [14][160/588], Time: 3.26, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3616
    Epoch: [14][180/588], Time: 3.28, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3454
    Epoch: [14][200/588], Time: 3.32, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4320
    Epoch: [14][220/588], Time: 3.34, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4455
    Epoch: [14][240/588], Time: 3.36, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4234
    Epoch: [14][260/588], Time: 3.37, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5656
    Epoch: [14][280/588], Time: 3.39, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4093
    Epoch: [14][300/588], Time: 3.40, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4176
    Epoch: [14][320/588], Time: 3.41, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4920
    Epoch: [14][340/588], Time: 3.42, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4483
    Epoch: [14][360/588], Time: 3.43, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4472
    Epoch: [14][380/588], Time: 3.44, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4007
    Epoch: [14][400/588], Time: 3.44, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4546
    Epoch: [14][420/588], Time: 3.45, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3748
    Epoch: [14][440/588], Time: 3.45, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4257
    Epoch: [14][460/588], Time: 3.46, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3861
    Epoch: [14][480/588], Time: 3.46, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4747
    Epoch: [14][500/588], Time: 3.46, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4566
    Epoch: [14][520/588], Time: 3.47, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4999
    Epoch: [14][540/588], Time: 3.47, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4948
    Epoch: [14][560/588], Time: 3.47, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4841
    Epoch: [14][580/588], Time: 3.47, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6920
    Epoch: [15][0/588], Time: 10.02, Data: 5.42, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4328
    Epoch: [15][20/588], Time: 3.91, Data: 0.32, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4610
    Epoch: [15][40/588], Time: 3.74, Data: 0.19, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4271
    Epoch: [15][60/588], Time: 3.66, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5275
    Epoch: [15][80/588], Time: 3.63, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5233
    Epoch: [15][100/588], Time: 3.62, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5339
    Epoch: [15][120/588], Time: 3.60, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4525
    Epoch: [15][140/588], Time: 3.60, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3801
    Epoch: [15][160/588], Time: 3.60, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4345
    Epoch: [15][180/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4424
    Epoch: [15][200/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3943
    Epoch: [15][220/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4788
    Epoch: [15][240/588], Time: 3.58, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4361
    Epoch: [15][260/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4261
    Epoch: [15][280/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3900
    Epoch: [15][300/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3889
    Epoch: [15][320/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4862
    Epoch: [15][340/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4467
    Epoch: [15][360/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3880
    Epoch: [15][380/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4057
    Epoch: [15][400/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3854
    Epoch: [15][420/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4939
    Epoch: [15][440/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4149
    Epoch: [15][460/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4335
    Epoch: [15][480/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4548
    Epoch: [15][500/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4816
    Epoch: [15][520/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4577
    Epoch: [15][540/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3465
    Epoch: [15][560/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3807
    Epoch: [15][580/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4127
    Epoch: [16][0/588], Time: 10.62, Data: 6.32, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5604
    Epoch: [16][20/588], Time: 3.92, Data: 0.36, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4247
    Epoch: [16][40/588], Time: 3.74, Data: 0.22, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3837
    Epoch: [16][60/588], Time: 3.71, Data: 0.17, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3732
    Epoch: [16][80/588], Time: 3.70, Data: 0.14, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4259
    Epoch: [16][100/588], Time: 3.68, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4514
    Epoch: [16][120/588], Time: 3.66, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4274
    Epoch: [16][140/588], Time: 3.64, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4113
    Epoch: [16][160/588], Time: 3.63, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3748
    Epoch: [16][180/588], Time: 3.62, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4117
    Epoch: [16][200/588], Time: 3.61, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3695
    Epoch: [16][220/588], Time: 3.61, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4227
    Epoch: [16][240/588], Time: 3.60, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6275
    Epoch: [16][260/588], Time: 3.60, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5642
    Epoch: [16][280/588], Time: 3.60, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4409
    Epoch: [16][300/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4046
    Epoch: [16][320/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3976
    Epoch: [16][340/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4038
    Epoch: [16][360/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3935
    Epoch: [16][380/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4028
    Epoch: [16][400/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4029
    Epoch: [16][420/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4154
    Epoch: [16][440/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4854
    Epoch: [16][460/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5318
    Epoch: [16][480/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4268
    Epoch: [16][500/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.8587
    Epoch: [16][520/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5265
    Epoch: [16][540/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6019
    Epoch: [16][560/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5523
    Epoch: [16][580/588], Time: 3.57, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4825
    Epoch: [17][0/588], Time: 11.03, Data: 6.49, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4139
    Epoch: [17][20/588], Time: 3.89, Data: 0.37, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4928
    Epoch: [17][40/588], Time: 3.70, Data: 0.22, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7734
    Epoch: [17][60/588], Time: 3.64, Data: 0.17, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5619
    Epoch: [17][80/588], Time: 3.64, Data: 0.14, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5103
    Epoch: [17][100/588], Time: 3.63, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5089
    Epoch: [17][120/588], Time: 3.62, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5789
    Epoch: [17][140/588], Time: 3.61, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5893
    Epoch: [17][160/588], Time: 3.60, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4950
    Epoch: [17][180/588], Time: 3.60, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6452
    Epoch: [17][200/588], Time: 3.59, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6228
    Epoch: [17][220/588], Time: 3.59, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4331
    Epoch: [17][240/588], Time: 3.58, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6075
    Epoch: [17][260/588], Time: 3.58, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4189
    Epoch: [17][280/588], Time: 3.58, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4356
    Epoch: [17][300/588], Time: 3.58, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4342
    Epoch: [17][320/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5713
    Epoch: [17][340/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5539
    Epoch: [17][360/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4789
    Epoch: [17][380/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4685
    Epoch: [17][400/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6079
    Epoch: [17][420/588], Time: 3.62, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5526
    Epoch: [17][440/588], Time: 3.62, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5495
    Epoch: [17][460/588], Time: 3.63, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4334
    Epoch: [17][480/588], Time: 3.63, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4313
    Epoch: [17][500/588], Time: 3.63, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4872
    Epoch: [17][520/588], Time: 3.63, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4112
    Epoch: [17][540/588], Time: 3.63, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5037
    Epoch: [17][560/588], Time: 3.64, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4349
    Epoch: [17][580/588], Time: 3.64, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4941
    Epoch: [18][0/588], Time: 11.15, Data: 6.80, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4646
    Epoch: [18][20/588], Time: 3.89, Data: 0.38, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3777
    Epoch: [18][40/588], Time: 3.72, Data: 0.23, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4777
    Epoch: [18][60/588], Time: 3.66, Data: 0.17, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5297
    Epoch: [18][80/588], Time: 3.63, Data: 0.15, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3274
    Epoch: [18][100/588], Time: 3.61, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4226
    Epoch: [18][120/588], Time: 3.60, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4254
    Epoch: [18][140/588], Time: 3.60, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4224
    Epoch: [18][160/588], Time: 3.59, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4967
    Epoch: [18][180/588], Time: 3.58, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3812
    Epoch: [18][200/588], Time: 3.58, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6078
    Epoch: [18][220/588], Time: 3.57, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4773
    Epoch: [18][240/588], Time: 3.57, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4831
    Epoch: [18][260/588], Time: 3.57, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7905
    Epoch: [18][280/588], Time: 3.57, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7383
    Epoch: [18][300/588], Time: 3.57, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3883
    Epoch: [18][320/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4754
    Epoch: [18][340/588], Time: 3.59, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5128
    Epoch: [18][360/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4743
    Epoch: [18][380/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4361
    Epoch: [18][400/588], Time: 3.60, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6699
    Epoch: [18][420/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4068
    Epoch: [18][440/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5126
    Epoch: [18][460/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4245
    Epoch: [18][480/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3979
    Epoch: [18][500/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3890
    Epoch: [18][520/588], Time: 3.61, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4154
    Epoch: [18][540/588], Time: 3.62, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4733
    Epoch: [18][560/588], Time: 3.62, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4525
    Epoch: [18][580/588], Time: 3.62, Data: 0.07, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4519
    Epoch: [19][0/588], Time: 12.19, Data: 8.20, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3925
    Epoch: [19][20/588], Time: 3.99, Data: 0.45, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4685
    Epoch: [19][40/588], Time: 3.78, Data: 0.26, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4351
    Epoch: [19][60/588], Time: 3.69, Data: 0.20, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3416
    Epoch: [19][80/588], Time: 3.66, Data: 0.16, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5108
    Epoch: [19][100/588], Time: 3.63, Data: 0.14, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4229
    Epoch: [19][120/588], Time: 3.62, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3877
    Epoch: [19][140/588], Time: 3.61, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3598
    Epoch: [19][160/588], Time: 3.60, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4814
    Epoch: [19][180/588], Time: 3.59, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4619
    Epoch: [19][200/588], Time: 3.58, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3868
    Epoch: [19][220/588], Time: 3.57, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4044
    Epoch: [19][240/588], Time: 3.57, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3833
    Epoch: [19][260/588], Time: 3.57, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3783
    Epoch: [19][280/588], Time: 3.56, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3659
    Epoch: [19][300/588], Time: 3.56, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4625
    Epoch: [19][320/588], Time: 3.56, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.7082
    Epoch: [19][340/588], Time: 3.56, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3851
    Epoch: [19][360/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3693
    Epoch: [19][380/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4211
    Epoch: [19][400/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4087
    Epoch: [19][420/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4090
    Epoch: [19][440/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3891
    Epoch: [19][460/588], Time: 3.54, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3603
    Epoch: [19][480/588], Time: 3.49, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4058
    Epoch: [19][500/588], Time: 3.44, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6536
    Epoch: [19][520/588], Time: 3.45, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3765
    Epoch: [19][540/588], Time: 3.45, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4769
    Epoch: [19][560/588], Time: 3.45, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5043
    Epoch: [19][580/588], Time: 3.45, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6017
    Epoch: [20][0/588], Time: 12.26, Data: 8.02, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4340
    Epoch: [20][20/588], Time: 3.95, Data: 0.44, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4242
    Epoch: [20][40/588], Time: 3.74, Data: 0.26, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4823
    Epoch: [20][60/588], Time: 3.70, Data: 0.19, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3965
    Epoch: [20][80/588], Time: 3.69, Data: 0.16, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4990
    Epoch: [20][100/588], Time: 3.66, Data: 0.14, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4699
    Epoch: [20][120/588], Time: 3.64, Data: 0.13, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4020
    Epoch: [20][140/588], Time: 3.63, Data: 0.12, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4194
    Epoch: [20][160/588], Time: 3.62, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3963
    Epoch: [20][180/588], Time: 3.61, Data: 0.11, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4272
    Epoch: [20][200/588], Time: 3.60, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4077
    Epoch: [20][220/588], Time: 3.60, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4344
    Epoch: [20][240/588], Time: 3.59, Data: 0.10, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3984
    Epoch: [20][260/588], Time: 3.58, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4088
    Epoch: [20][280/588], Time: 3.58, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4674
    Epoch: [20][300/588], Time: 3.58, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4783
    Epoch: [20][320/588], Time: 3.57, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4137
    Epoch: [20][340/588], Time: 3.57, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3774
    Epoch: [20][360/588], Time: 3.57, Data: 0.09, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4219
    Epoch: [20][380/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3723
    Epoch: [20][400/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4044
    Epoch: [20][420/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5698
    Epoch: [20][440/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4055
    Epoch: [20][460/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4041
    Epoch: [20][480/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4108
    Epoch: [20][500/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.3880
    Epoch: [20][520/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4289
    Epoch: [20][540/588], Time: 3.56, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.4421
    Epoch: [20][560/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.5656
    Epoch: [20][580/588], Time: 3.57, Data: 0.08, lr_sound: 0.001, lr_frame: 0.0001, loss: 0.6244
    Epoch: [21][0/588], Time: 10.36, Data: 5.85, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6479
    Epoch: [21][20/588], Time: 3.86, Data: 0.34, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3788
    Epoch: [21][40/588], Time: 3.72, Data: 0.20, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6105
    Epoch: [21][60/588], Time: 3.65, Data: 0.16, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3737
    Epoch: [21][80/588], Time: 3.61, Data: 0.13, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3905
    Epoch: [21][100/588], Time: 3.60, Data: 0.12, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3670
    Epoch: [21][120/588], Time: 3.59, Data: 0.11, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4420
    Epoch: [21][140/588], Time: 3.58, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5857
    Epoch: [21][160/588], Time: 3.57, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3873
    Epoch: [21][180/588], Time: 3.57, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6601
    Epoch: [21][200/588], Time: 3.56, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5735
    Epoch: [21][220/588], Time: 3.56, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3563
    Epoch: [21][240/588], Time: 3.56, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5797
    Epoch: [21][260/588], Time: 3.56, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6502
    Epoch: [21][280/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5655
    Epoch: [21][300/588], Time: 3.56, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6812
    Epoch: [21][320/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3992
    Epoch: [21][340/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5721
    Epoch: [21][360/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6239
    Epoch: [21][380/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3453
    Epoch: [21][400/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5431
    Epoch: [21][420/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5803
    Epoch: [21][440/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3768
    Epoch: [21][460/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3649
    Epoch: [21][480/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5527
    Epoch: [21][500/588], Time: 3.55, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6654
    Epoch: [21][520/588], Time: 3.55, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4141
    Epoch: [21][540/588], Time: 3.55, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3871
    Epoch: [21][560/588], Time: 3.55, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4290
    Epoch: [21][580/588], Time: 3.55, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6501
    Epoch: [22][0/588], Time: 11.04, Data: 6.67, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3636
    Epoch: [22][20/588], Time: 3.88, Data: 0.38, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3656
    Epoch: [22][40/588], Time: 3.70, Data: 0.22, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6042
    Epoch: [22][60/588], Time: 3.64, Data: 0.17, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5815
    Epoch: [22][80/588], Time: 3.62, Data: 0.14, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5601
    Epoch: [22][100/588], Time: 3.61, Data: 0.13, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6574
    Epoch: [22][120/588], Time: 3.60, Data: 0.12, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3657
    Epoch: [22][140/588], Time: 3.59, Data: 0.11, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.7409
    Epoch: [22][160/588], Time: 3.58, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6868
    Epoch: [22][180/588], Time: 3.58, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5028
    Epoch: [22][200/588], Time: 3.57, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5668
    Epoch: [22][220/588], Time: 3.57, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6817
    Epoch: [22][240/588], Time: 3.57, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5935
    Epoch: [22][260/588], Time: 3.57, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6134
    Epoch: [22][280/588], Time: 3.57, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6647
    Epoch: [22][300/588], Time: 3.56, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3430
    Epoch: [22][320/588], Time: 3.56, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3984
    Epoch: [22][340/588], Time: 3.56, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4256
    Epoch: [22][360/588], Time: 3.56, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5670
    Epoch: [22][380/588], Time: 3.53, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4902
    Epoch: [22][400/588], Time: 3.45, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6103
    Epoch: [22][420/588], Time: 3.43, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3598
    Epoch: [22][440/588], Time: 3.43, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6382
    Epoch: [22][460/588], Time: 3.44, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6018
    Epoch: [22][480/588], Time: 3.44, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4739
    Epoch: [22][500/588], Time: 3.45, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6225
    Epoch: [22][520/588], Time: 3.45, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4441
    Epoch: [22][540/588], Time: 3.46, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3756
    Epoch: [22][560/588], Time: 3.47, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3753
    Epoch: [22][580/588], Time: 3.48, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3608
    Epoch: [23][0/588], Time: 10.69, Data: 6.21, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3798
    Epoch: [23][20/588], Time: 3.88, Data: 0.36, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3771
    Epoch: [23][40/588], Time: 3.70, Data: 0.21, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3322
    Epoch: [23][60/588], Time: 3.66, Data: 0.16, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3418
    Epoch: [23][80/588], Time: 3.63, Data: 0.14, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3474
    Epoch: [23][100/588], Time: 3.61, Data: 0.12, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5040
    Epoch: [23][120/588], Time: 3.60, Data: 0.11, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3326
    Epoch: [23][140/588], Time: 3.59, Data: 0.11, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5006
    Epoch: [23][160/588], Time: 3.58, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3584
    Epoch: [23][180/588], Time: 3.60, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3664
    Epoch: [23][200/588], Time: 3.62, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3691
    Epoch: [23][220/588], Time: 3.61, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3462
    Epoch: [23][240/588], Time: 3.61, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3640
    Epoch: [23][260/588], Time: 3.60, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3930
    Epoch: [23][280/588], Time: 3.60, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4322
    Epoch: [23][300/588], Time: 3.59, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3862
    Epoch: [23][320/588], Time: 3.59, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3674
    Epoch: [23][340/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5060
    Epoch: [23][360/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3910
    Epoch: [23][380/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3941
    Epoch: [23][400/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4842
    Epoch: [23][420/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3500
    Epoch: [23][440/588], Time: 3.57, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3723
    Epoch: [23][460/588], Time: 3.57, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3568
    Epoch: [23][480/588], Time: 3.57, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4217
    Epoch: [23][500/588], Time: 3.57, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4121
    Epoch: [23][520/588], Time: 3.57, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3518
    Epoch: [23][540/588], Time: 3.58, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4068
    Epoch: [23][560/588], Time: 3.57, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3579
    Epoch: [23][580/588], Time: 3.57, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4058
    Epoch: [24][0/588], Time: 10.19, Data: 5.94, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4136
    Epoch: [24][20/588], Time: 3.92, Data: 0.34, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4972
    Epoch: [24][40/588], Time: 3.74, Data: 0.21, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3708
    Epoch: [24][60/588], Time: 3.70, Data: 0.16, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3785
    Epoch: [24][80/588], Time: 3.65, Data: 0.14, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6398
    Epoch: [24][100/588], Time: 3.64, Data: 0.12, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4020
    Epoch: [24][120/588], Time: 3.62, Data: 0.11, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3408
    Epoch: [24][140/588], Time: 3.62, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4688
    Epoch: [24][160/588], Time: 3.60, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4631
    Epoch: [24][180/588], Time: 3.60, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3632
    Epoch: [24][200/588], Time: 3.59, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3690
    Epoch: [24][220/588], Time: 3.59, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3802
    Epoch: [24][240/588], Time: 3.59, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4363
    Epoch: [24][260/588], Time: 3.58, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4099
    Epoch: [24][280/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4547
    Epoch: [24][300/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4786
    Epoch: [24][320/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3652
    Epoch: [24][340/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4372
    Epoch: [24][360/588], Time: 3.59, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3610
    Epoch: [24][380/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3973
    Epoch: [24][400/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3871
    Epoch: [24][420/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3565
    Epoch: [24][440/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3952
    Epoch: [24][460/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3745
    Epoch: [24][480/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3893
    Epoch: [24][500/588], Time: 3.58, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3428
    Epoch: [24][520/588], Time: 3.58, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3428
    Epoch: [24][540/588], Time: 3.58, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3541
    Epoch: [24][560/588], Time: 3.57, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4277
    Epoch: [24][580/588], Time: 3.57, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3831
    Epoch: [25][0/588], Time: 10.09, Data: 5.98, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4383
    Epoch: [25][20/588], Time: 3.87, Data: 0.34, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4794
    Epoch: [25][40/588], Time: 3.70, Data: 0.21, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4150
    Epoch: [25][60/588], Time: 3.64, Data: 0.16, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4479
    Epoch: [25][80/588], Time: 3.61, Data: 0.14, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4144
    Epoch: [25][100/588], Time: 3.60, Data: 0.12, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4442
    Epoch: [25][120/588], Time: 3.59, Data: 0.11, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4647
    Epoch: [25][140/588], Time: 3.58, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3903
    Epoch: [25][160/588], Time: 3.57, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3634
    Epoch: [25][180/588], Time: 3.57, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4136
    Epoch: [25][200/588], Time: 3.56, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3806
    Epoch: [25][220/588], Time: 3.56, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3701
    Epoch: [25][240/588], Time: 3.56, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4035
    Epoch: [25][260/588], Time: 3.56, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4287
    Epoch: [25][280/588], Time: 3.50, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3839
    Epoch: [25][300/588], Time: 3.41, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3443
    Epoch: [25][320/588], Time: 3.37, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4006
    Epoch: [25][340/588], Time: 3.38, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3616
    Epoch: [25][360/588], Time: 3.39, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4512
    Epoch: [25][380/588], Time: 3.39, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3408
    Epoch: [25][400/588], Time: 3.40, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3678
    Epoch: [25][420/588], Time: 3.40, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3283
    Epoch: [25][440/588], Time: 3.42, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3819
    Epoch: [25][460/588], Time: 3.43, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3440
    Epoch: [25][480/588], Time: 3.44, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4043
    Epoch: [25][500/588], Time: 3.44, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3698
    Epoch: [25][520/588], Time: 3.44, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4062
    Epoch: [25][540/588], Time: 3.45, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3218
    Epoch: [25][560/588], Time: 3.45, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3266
    Epoch: [25][580/588], Time: 3.45, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3361
    Epoch: [26][0/588], Time: 10.03, Data: 5.21, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3407
    Epoch: [26][20/588], Time: 3.89, Data: 0.31, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3975
    Epoch: [26][40/588], Time: 3.74, Data: 0.19, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4088
    Epoch: [26][60/588], Time: 3.67, Data: 0.16, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4036
    Epoch: [26][80/588], Time: 3.63, Data: 0.14, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3672
    Epoch: [26][100/588], Time: 3.60, Data: 0.12, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3537
    Epoch: [26][120/588], Time: 3.59, Data: 0.11, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4193
    Epoch: [26][140/588], Time: 3.58, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4055
    Epoch: [26][160/588], Time: 3.57, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5905
    Epoch: [26][180/588], Time: 3.57, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3580
    Epoch: [26][200/588], Time: 3.56, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3716
    Epoch: [26][220/588], Time: 3.56, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3699
    Epoch: [26][240/588], Time: 3.55, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3980
    Epoch: [26][260/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3788
    Epoch: [26][280/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3420
    Epoch: [26][300/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3523
    Epoch: [26][320/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4162
    Epoch: [26][340/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3602
    Epoch: [26][360/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4266
    Epoch: [26][380/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3598
    Epoch: [26][400/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3798
    Epoch: [26][420/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3517
    Epoch: [26][440/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3860
    Epoch: [26][460/588], Time: 3.55, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3743
    Epoch: [26][480/588], Time: 3.55, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4346
    Epoch: [26][500/588], Time: 3.54, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3484
    Epoch: [26][520/588], Time: 3.54, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3438
    Epoch: [26][540/588], Time: 3.54, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4046
    Epoch: [26][560/588], Time: 3.54, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3803
    Epoch: [26][580/588], Time: 3.54, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4078
    Epoch: [27][0/588], Time: 10.58, Data: 5.98, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3676
    Epoch: [27][20/588], Time: 3.92, Data: 0.35, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3323
    Epoch: [27][40/588], Time: 3.72, Data: 0.21, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3875
    Epoch: [27][60/588], Time: 3.67, Data: 0.16, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4119
    Epoch: [27][80/588], Time: 3.63, Data: 0.14, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3776
    Epoch: [27][100/588], Time: 3.61, Data: 0.12, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4094
    Epoch: [27][120/588], Time: 3.60, Data: 0.11, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4005
    Epoch: [27][140/588], Time: 3.59, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4217
    Epoch: [27][160/588], Time: 3.58, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4083
    Epoch: [27][180/588], Time: 3.57, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3481
    Epoch: [27][200/588], Time: 3.57, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3332
    Epoch: [27][220/588], Time: 3.56, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3504
    Epoch: [27][240/588], Time: 3.56, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3227
    Epoch: [27][260/588], Time: 3.56, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3370
    Epoch: [27][280/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3415
    Epoch: [27][300/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4008
    Epoch: [27][320/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3426
    Epoch: [27][340/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4365
    Epoch: [27][360/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3592
    Epoch: [27][380/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5975
    Epoch: [27][400/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3685
    Epoch: [27][420/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3313
    Epoch: [27][440/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4072
    Epoch: [27][460/588], Time: 3.55, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3391
    Epoch: [27][480/588], Time: 3.54, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3707
    Epoch: [27][500/588], Time: 3.54, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3824
    Epoch: [27][520/588], Time: 3.54, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3535
    Epoch: [27][540/588], Time: 3.54, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4348
    Epoch: [27][560/588], Time: 3.54, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3988
    Epoch: [27][580/588], Time: 3.53, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3998
    Epoch: [28][0/588], Time: 11.57, Data: 7.13, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3906
    Epoch: [28][20/588], Time: 3.95, Data: 0.40, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3656
    Epoch: [28][40/588], Time: 3.75, Data: 0.24, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3387
    Epoch: [28][60/588], Time: 3.73, Data: 0.18, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3495
    Epoch: [28][80/588], Time: 3.72, Data: 0.15, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3752
    Epoch: [28][100/588], Time: 3.68, Data: 0.13, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3451
    Epoch: [28][120/588], Time: 3.66, Data: 0.12, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3669
    Epoch: [28][140/588], Time: 3.64, Data: 0.11, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3401
    Epoch: [28][160/588], Time: 3.63, Data: 0.11, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3942
    Epoch: [28][180/588], Time: 3.62, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3713
    Epoch: [28][200/588], Time: 3.54, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3351
    Epoch: [28][220/588], Time: 3.42, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3635
    Epoch: [28][240/588], Time: 3.34, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4332
    Epoch: [28][260/588], Time: 3.36, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6382
    Epoch: [28][280/588], Time: 3.37, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3738
    Epoch: [28][300/588], Time: 3.38, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3572
    Epoch: [28][320/588], Time: 3.39, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3473
    Epoch: [28][340/588], Time: 3.40, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3714
    Epoch: [28][360/588], Time: 3.40, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3765
    Epoch: [28][380/588], Time: 3.41, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4084
    Epoch: [28][400/588], Time: 3.41, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3933
    Epoch: [28][420/588], Time: 3.42, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3089
    Epoch: [28][440/588], Time: 3.42, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3691
    Epoch: [28][460/588], Time: 3.43, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6418
    Epoch: [28][480/588], Time: 3.43, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3365
    Epoch: [28][500/588], Time: 3.44, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3327
    Epoch: [28][520/588], Time: 3.45, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3637
    Epoch: [28][540/588], Time: 3.46, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3963
    Epoch: [28][560/588], Time: 3.47, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5257
    Epoch: [28][580/588], Time: 3.48, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4325
    Epoch: [29][0/588], Time: 11.05, Data: 6.81, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4593
    Epoch: [29][20/588], Time: 3.87, Data: 0.38, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3301
    Epoch: [29][40/588], Time: 3.70, Data: 0.23, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3600
    Epoch: [29][60/588], Time: 3.63, Data: 0.17, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3581
    Epoch: [29][80/588], Time: 3.61, Data: 0.15, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3555
    Epoch: [29][100/588], Time: 3.58, Data: 0.13, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3320
    Epoch: [29][120/588], Time: 3.57, Data: 0.12, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3409
    Epoch: [29][140/588], Time: 3.56, Data: 0.11, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3512
    Epoch: [29][160/588], Time: 3.56, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4021
    Epoch: [29][180/588], Time: 3.55, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5918
    Epoch: [29][200/588], Time: 3.55, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3602
    Epoch: [29][220/588], Time: 3.55, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3250
    Epoch: [29][240/588], Time: 3.55, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4104
    Epoch: [29][260/588], Time: 3.54, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6283
    Epoch: [29][280/588], Time: 3.54, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3890
    Epoch: [29][300/588], Time: 3.54, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4359
    Epoch: [29][320/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3409
    Epoch: [29][340/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4076
    Epoch: [29][360/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3711
    Epoch: [29][380/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3958
    Epoch: [29][400/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3803
    Epoch: [29][420/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3456
    Epoch: [29][440/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3223
    Epoch: [29][460/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3339
    Epoch: [29][480/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4066
    Epoch: [29][500/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3816
    Epoch: [29][520/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3319
    Epoch: [29][540/588], Time: 3.54, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4031
    Epoch: [29][560/588], Time: 3.53, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3483
    Epoch: [29][580/588], Time: 3.53, Data: 0.07, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3718
    Epoch: [30][0/588], Time: 11.22, Data: 6.87, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3354
    Epoch: [30][20/588], Time: 3.91, Data: 0.39, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3291
    Epoch: [30][40/588], Time: 3.74, Data: 0.23, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3546
    Epoch: [30][60/588], Time: 3.67, Data: 0.17, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3312
    Epoch: [30][80/588], Time: 3.63, Data: 0.15, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3490
    Epoch: [30][100/588], Time: 3.61, Data: 0.13, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3505
    Epoch: [30][120/588], Time: 3.59, Data: 0.12, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3487
    Epoch: [30][140/588], Time: 3.59, Data: 0.11, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4053
    Epoch: [30][160/588], Time: 3.58, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3274
    Epoch: [30][180/588], Time: 3.57, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3829
    Epoch: [30][200/588], Time: 3.56, Data: 0.10, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4055
    Epoch: [30][220/588], Time: 3.58, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3392
    Epoch: [30][240/588], Time: 3.57, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.6289
    Epoch: [30][260/588], Time: 3.57, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3726
    Epoch: [30][280/588], Time: 3.57, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3422
    Epoch: [30][300/588], Time: 3.57, Data: 0.09, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3404
    Epoch: [30][320/588], Time: 3.57, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3824
    Epoch: [30][340/588], Time: 3.59, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3670
    Epoch: [30][360/588], Time: 3.59, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4178
    Epoch: [30][380/588], Time: 3.59, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3650
    Epoch: [30][400/588], Time: 3.59, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4549
    Epoch: [30][420/588], Time: 3.59, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5066
    Epoch: [30][440/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3211
    Epoch: [30][460/588], Time: 3.58, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3403
    Epoch: [30][480/588], Time: 3.59, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3939
    Epoch: [30][500/588], Time: 3.59, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4540
    Epoch: [30][520/588], Time: 3.59, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3804
    Epoch: [30][540/588], Time: 3.59, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.5974
    Epoch: [30][560/588], Time: 3.60, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.4370
    Epoch: [30][580/588], Time: 3.60, Data: 0.08, lr_sound: 0.0001, lr_frame: 1e-05, loss: 0.3399
    Epoch: [31][0/588], Time: 10.71, Data: 6.42, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4458
    Epoch: [31][20/588], Time: 3.92, Data: 0.37, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3643
    Epoch: [31][40/588], Time: 3.73, Data: 0.22, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3355
    Epoch: [31][60/588], Time: 3.67, Data: 0.17, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3333
    Epoch: [31][80/588], Time: 3.64, Data: 0.14, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3082
    Epoch: [31][100/588], Time: 3.62, Data: 0.13, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3776
    Epoch: [31][120/588], Time: 3.42, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3210
    Epoch: [31][140/588], Time: 3.24, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3190
    Epoch: [31][160/588], Time: 3.24, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3640
    Epoch: [31][180/588], Time: 3.27, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3858
    Epoch: [31][200/588], Time: 3.30, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3236
    Epoch: [31][220/588], Time: 3.33, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3156
    Epoch: [31][240/588], Time: 3.37, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3671
    Epoch: [31][260/588], Time: 3.39, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3622
    Epoch: [31][280/588], Time: 3.41, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4114
    Epoch: [31][300/588], Time: 3.42, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3549
    Epoch: [31][320/588], Time: 3.44, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4356
    Epoch: [31][340/588], Time: 3.45, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3631
    Epoch: [31][360/588], Time: 3.46, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4178
    Epoch: [31][380/588], Time: 3.46, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3956
    Epoch: [31][400/588], Time: 3.47, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3890
    Epoch: [31][420/588], Time: 3.47, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3523
    Epoch: [31][440/588], Time: 3.47, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3463
    Epoch: [31][460/588], Time: 3.48, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4847
    Epoch: [31][480/588], Time: 3.48, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4539
    Epoch: [31][500/588], Time: 3.49, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3850
    Epoch: [31][520/588], Time: 3.49, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3459
    Epoch: [31][540/588], Time: 3.49, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3478
    Epoch: [31][560/588], Time: 3.49, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3074
    Epoch: [31][580/588], Time: 3.49, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3404
    Epoch: [32][0/588], Time: 10.70, Data: 6.14, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3334
    Epoch: [32][20/588], Time: 3.92, Data: 0.36, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.5270
    Epoch: [32][40/588], Time: 3.76, Data: 0.21, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3258
    Epoch: [32][60/588], Time: 3.70, Data: 0.16, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3965
    Epoch: [32][80/588], Time: 3.65, Data: 0.14, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3870
    Epoch: [32][100/588], Time: 3.62, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3409
    Epoch: [32][120/588], Time: 3.61, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3277
    Epoch: [32][140/588], Time: 3.61, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3998
    Epoch: [32][160/588], Time: 3.60, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3716
    Epoch: [32][180/588], Time: 3.60, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3506
    Epoch: [32][200/588], Time: 3.59, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3291
    Epoch: [32][220/588], Time: 3.59, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3135
    Epoch: [32][240/588], Time: 3.58, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3560
    Epoch: [32][260/588], Time: 3.58, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3896
    Epoch: [32][280/588], Time: 3.58, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3423
    Epoch: [32][300/588], Time: 3.58, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3481
    Epoch: [32][320/588], Time: 3.58, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3496
    Epoch: [32][340/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3705
    Epoch: [32][360/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3185
    Epoch: [32][380/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3714
    Epoch: [32][400/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3340
    Epoch: [32][420/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3655
    Epoch: [32][440/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3621
    Epoch: [32][460/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.2996
    Epoch: [32][480/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3565
    Epoch: [32][500/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3735
    Epoch: [32][520/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3286
    Epoch: [32][540/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3190
    Epoch: [32][560/588], Time: 3.57, Data: 0.07, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3942
    Epoch: [32][580/588], Time: 3.57, Data: 0.07, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3202
    Epoch: [33][0/588], Time: 12.27, Data: 8.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3580
    Epoch: [33][20/588], Time: 4.03, Data: 0.45, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3605
    Epoch: [33][40/588], Time: 3.80, Data: 0.26, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3770
    Epoch: [33][60/588], Time: 3.72, Data: 0.20, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3572
    Epoch: [33][80/588], Time: 3.67, Data: 0.17, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3470
    Epoch: [33][100/588], Time: 3.64, Data: 0.15, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.6715
    Epoch: [33][120/588], Time: 3.63, Data: 0.13, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3759
    Epoch: [33][140/588], Time: 3.61, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4220
    Epoch: [33][160/588], Time: 3.62, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3710
    Epoch: [33][180/588], Time: 3.61, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4108
    Epoch: [33][200/588], Time: 3.60, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3815
    Epoch: [33][220/588], Time: 3.59, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3612
    Epoch: [33][240/588], Time: 3.59, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3679
    Epoch: [33][260/588], Time: 3.59, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3792
    Epoch: [33][280/588], Time: 3.59, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3153
    Epoch: [33][300/588], Time: 3.58, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.5187
    Epoch: [33][320/588], Time: 3.59, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4615
    Epoch: [33][340/588], Time: 3.58, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3520
    Epoch: [33][360/588], Time: 3.58, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3609
    Epoch: [33][380/588], Time: 3.58, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3296
    Epoch: [33][400/588], Time: 3.58, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3831
    Epoch: [33][420/588], Time: 3.58, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3143
    Epoch: [33][440/588], Time: 3.58, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3553
    Epoch: [33][460/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4489
    Epoch: [33][480/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3263
    Epoch: [33][500/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3154
    Epoch: [33][520/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3727
    Epoch: [33][540/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3832
    Epoch: [33][560/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3905
    Epoch: [33][580/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3305
    Epoch: [34][0/588], Time: 11.20, Data: 6.84, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3398
    Epoch: [34][20/588], Time: 3.36, Data: 0.39, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3621
    Epoch: [34][40/588], Time: 2.66, Data: 0.23, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3437
    Epoch: [34][60/588], Time: 2.62, Data: 0.18, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3828
    Epoch: [34][80/588], Time: 2.89, Data: 0.15, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3262
    Epoch: [34][100/588], Time: 3.05, Data: 0.13, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3721
    Epoch: [34][120/588], Time: 3.15, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3466
    Epoch: [34][140/588], Time: 3.23, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3633
    Epoch: [34][160/588], Time: 3.28, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3416
    Epoch: [34][180/588], Time: 3.32, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3251
    Epoch: [34][200/588], Time: 3.36, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3617
    Epoch: [34][220/588], Time: 3.39, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4142
    Epoch: [34][240/588], Time: 3.41, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3937
    Epoch: [34][260/588], Time: 3.42, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3582
    Epoch: [34][280/588], Time: 3.44, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3384
    Epoch: [34][300/588], Time: 3.45, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4356
    Epoch: [34][320/588], Time: 3.46, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3542
    Epoch: [34][340/588], Time: 3.47, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3339
    Epoch: [34][360/588], Time: 3.48, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3346
    Epoch: [34][380/588], Time: 3.49, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3600
    Epoch: [34][400/588], Time: 3.50, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3118
    Epoch: [34][420/588], Time: 3.50, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3470
    Epoch: [34][440/588], Time: 3.51, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3397
    Epoch: [34][460/588], Time: 3.52, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3387
    Epoch: [34][480/588], Time: 3.52, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3524
    Epoch: [34][500/588], Time: 3.53, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4512
    Epoch: [34][520/588], Time: 3.53, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3266
    Epoch: [34][540/588], Time: 3.54, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3696
    Epoch: [34][560/588], Time: 3.54, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3651
    Epoch: [34][580/588], Time: 3.54, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3262
    Epoch: [35][0/588], Time: 12.53, Data: 8.45, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3351
    Epoch: [35][20/588], Time: 3.96, Data: 0.46, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3743
    Epoch: [35][40/588], Time: 3.77, Data: 0.27, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4320
    Epoch: [35][60/588], Time: 3.69, Data: 0.20, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4278
    Epoch: [35][80/588], Time: 3.65, Data: 0.17, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3385
    Epoch: [35][100/588], Time: 3.64, Data: 0.15, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3342
    Epoch: [35][120/588], Time: 3.63, Data: 0.13, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3556
    Epoch: [35][140/588], Time: 3.64, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3002
    Epoch: [35][160/588], Time: 3.64, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3477
    Epoch: [35][180/588], Time: 3.65, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3940
    Epoch: [35][200/588], Time: 3.67, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4446
    Epoch: [35][220/588], Time: 3.66, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3434
    Epoch: [35][240/588], Time: 3.66, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3520
    Epoch: [35][260/588], Time: 3.65, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4343
    Epoch: [35][280/588], Time: 3.64, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3722
    Epoch: [35][300/588], Time: 3.64, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4176
    Epoch: [35][320/588], Time: 3.63, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3487
    Epoch: [35][340/588], Time: 3.63, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3253
    Epoch: [35][360/588], Time: 3.64, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4084
    Epoch: [35][380/588], Time: 3.64, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3325
    Epoch: [35][400/588], Time: 3.65, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3186
    Epoch: [35][420/588], Time: 3.65, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4150
    Epoch: [35][440/588], Time: 3.66, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3139
    Epoch: [35][460/588], Time: 3.66, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3583
    Epoch: [35][480/588], Time: 3.66, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3238
    Epoch: [35][500/588], Time: 3.66, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3295
    Epoch: [35][520/588], Time: 3.66, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3415
    Epoch: [35][540/588], Time: 3.67, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3263
    Epoch: [35][560/588], Time: 3.67, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3509
    Epoch: [35][580/588], Time: 3.67, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3366
    Epoch: [36][0/588], Time: 10.77, Data: 6.42, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3352
    Epoch: [36][20/588], Time: 3.94, Data: 0.37, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3314
    Epoch: [36][40/588], Time: 3.77, Data: 0.22, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3402
    Epoch: [36][60/588], Time: 3.69, Data: 0.17, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3341
    Epoch: [36][80/588], Time: 3.66, Data: 0.14, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4877
    Epoch: [36][100/588], Time: 3.63, Data: 0.13, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3673
    Epoch: [36][120/588], Time: 3.62, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3727
    Epoch: [36][140/588], Time: 3.61, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3624
    Epoch: [36][160/588], Time: 3.60, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3266
    Epoch: [36][180/588], Time: 3.60, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3816
    Epoch: [36][200/588], Time: 3.60, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3768
    Epoch: [36][220/588], Time: 3.60, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3788
    Epoch: [36][240/588], Time: 3.59, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3190
    Epoch: [36][260/588], Time: 3.59, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4657
    Epoch: [36][280/588], Time: 3.58, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3776
    Epoch: [36][300/588], Time: 3.58, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3735
    Epoch: [36][320/588], Time: 3.58, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.5781
    Epoch: [36][340/588], Time: 3.58, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3493
    Epoch: [36][360/588], Time: 3.58, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3581
    Epoch: [36][380/588], Time: 3.58, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3131
    Epoch: [36][400/588], Time: 3.59, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3763
    Epoch: [36][420/588], Time: 3.59, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4228
    Epoch: [36][440/588], Time: 3.59, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3265
    Epoch: [36][460/588], Time: 3.60, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3011
    Epoch: [36][480/588], Time: 3.61, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3431
    Epoch: [36][500/588], Time: 3.61, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3653
    Epoch: [36][520/588], Time: 3.62, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3498
    Epoch: [36][540/588], Time: 3.58, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3986
    Epoch: [36][560/588], Time: 3.53, Data: 0.07, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3949
    Epoch: [36][580/588], Time: 3.52, Data: 0.07, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3926
    Epoch: [37][0/588], Time: 11.21, Data: 7.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3325
    Epoch: [37][20/588], Time: 3.91, Data: 0.40, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3141
    Epoch: [37][40/588], Time: 3.72, Data: 0.24, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3676
    Epoch: [37][60/588], Time: 3.67, Data: 0.18, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3666
    Epoch: [37][80/588], Time: 3.64, Data: 0.15, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3070
    Epoch: [37][100/588], Time: 3.62, Data: 0.13, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4426
    Epoch: [37][120/588], Time: 3.61, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3601
    Epoch: [37][140/588], Time: 3.59, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3347
    Epoch: [37][160/588], Time: 3.58, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3554
    Epoch: [37][180/588], Time: 3.58, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3161
    Epoch: [37][200/588], Time: 3.58, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3679
    Epoch: [37][220/588], Time: 3.58, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.6818
    Epoch: [37][240/588], Time: 3.57, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4143
    Epoch: [37][260/588], Time: 3.57, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3414
    Epoch: [37][280/588], Time: 3.56, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3222
    Epoch: [37][300/588], Time: 3.56, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3428
    Epoch: [37][320/588], Time: 3.56, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3892
    Epoch: [37][340/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4390
    Epoch: [37][360/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3207
    Epoch: [37][380/588], Time: 3.55, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3385
    Epoch: [37][400/588], Time: 3.55, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3980
    Epoch: [37][420/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3076
    Epoch: [37][440/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3518
    Epoch: [37][460/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4140
    Epoch: [37][480/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3549
    Epoch: [37][500/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3890
    Epoch: [37][520/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4008
    Epoch: [37][540/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3388
    Epoch: [37][560/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3563
    Epoch: [37][580/588], Time: 3.57, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3432
    Epoch: [38][0/588], Time: 10.96, Data: 6.66, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3733
    Epoch: [38][20/588], Time: 3.91, Data: 0.38, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3046
    Epoch: [38][40/588], Time: 3.74, Data: 0.23, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3502
    Epoch: [38][60/588], Time: 3.68, Data: 0.17, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4164
    Epoch: [38][80/588], Time: 3.65, Data: 0.15, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3567
    Epoch: [38][100/588], Time: 3.63, Data: 0.13, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3608
    Epoch: [38][120/588], Time: 3.61, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3295
    Epoch: [38][140/588], Time: 3.61, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3155
    Epoch: [38][160/588], Time: 3.60, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3477
    Epoch: [38][180/588], Time: 3.59, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3236
    Epoch: [38][200/588], Time: 3.58, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3473
    Epoch: [38][220/588], Time: 3.58, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4402
    Epoch: [38][240/588], Time: 3.57, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3466
    Epoch: [38][260/588], Time: 3.57, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3302
    Epoch: [38][280/588], Time: 3.57, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3508
    Epoch: [38][300/588], Time: 3.56, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.5815
    Epoch: [38][320/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3931
    Epoch: [38][340/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4879
    Epoch: [38][360/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3511
    Epoch: [38][380/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4940
    Epoch: [38][400/588], Time: 3.55, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3069
    Epoch: [38][420/588], Time: 3.55, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4348
    Epoch: [38][440/588], Time: 3.55, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3259
    Epoch: [38][460/588], Time: 3.55, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.6314
    Epoch: [38][480/588], Time: 3.55, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3181
    Epoch: [38][500/588], Time: 3.55, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4068
    Epoch: [38][520/588], Time: 3.55, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3383
    Epoch: [38][540/588], Time: 3.55, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3203
    Epoch: [38][560/588], Time: 3.55, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3694
    Epoch: [38][580/588], Time: 3.55, Data: 0.07, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3409
    Epoch: [39][0/588], Time: 11.27, Data: 6.90, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3762
    Epoch: [39][20/588], Time: 3.94, Data: 0.39, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3435
    Epoch: [39][40/588], Time: 3.73, Data: 0.23, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3324
    Epoch: [39][60/588], Time: 3.66, Data: 0.18, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3914
    Epoch: [39][80/588], Time: 3.63, Data: 0.15, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.6171
    Epoch: [39][100/588], Time: 3.62, Data: 0.13, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4659
    Epoch: [39][120/588], Time: 3.61, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3257
    Epoch: [39][140/588], Time: 3.60, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3264
    Epoch: [39][160/588], Time: 3.59, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3570
    Epoch: [39][180/588], Time: 3.59, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3244
    Epoch: [39][200/588], Time: 3.58, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3323
    Epoch: [39][220/588], Time: 3.58, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3436
    Epoch: [39][240/588], Time: 3.57, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4165
    Epoch: [39][260/588], Time: 3.57, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3451
    Epoch: [39][280/588], Time: 3.57, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3121
    Epoch: [39][300/588], Time: 3.57, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3341
    Epoch: [39][320/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3775
    Epoch: [39][340/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3177
    Epoch: [39][360/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3599
    Epoch: [39][380/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3516
    Epoch: [39][400/588], Time: 3.56, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3775
    Epoch: [39][420/588], Time: 3.55, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3648
    Epoch: [39][440/588], Time: 3.55, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3435
    Epoch: [39][460/588], Time: 3.49, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3251
    Epoch: [39][480/588], Time: 3.43, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3373
    Epoch: [39][500/588], Time: 3.44, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.6936
    Epoch: [39][520/588], Time: 3.45, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3529
    Epoch: [39][540/588], Time: 3.46, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4356
    Epoch: [39][560/588], Time: 3.47, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.6890
    Epoch: [39][580/588], Time: 3.47, Data: 0.07, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3572
    Epoch: [40][0/588], Time: 10.93, Data: 6.57, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.6221
    Epoch: [40][20/588], Time: 3.84, Data: 0.37, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4456
    Epoch: [40][40/588], Time: 3.70, Data: 0.22, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3676
    Epoch: [40][60/588], Time: 3.63, Data: 0.17, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3674
    Epoch: [40][80/588], Time: 3.58, Data: 0.14, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3922
    Epoch: [40][100/588], Time: 3.57, Data: 0.13, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3712
    Epoch: [40][120/588], Time: 3.56, Data: 0.12, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3604
    Epoch: [40][140/588], Time: 3.55, Data: 0.11, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3782
    Epoch: [40][160/588], Time: 3.55, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3133
    Epoch: [40][180/588], Time: 3.54, Data: 0.10, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3467
    Epoch: [40][200/588], Time: 3.53, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3651
    Epoch: [40][220/588], Time: 3.54, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4619
    Epoch: [40][240/588], Time: 3.53, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3811
    Epoch: [40][260/588], Time: 3.53, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.7480
    Epoch: [40][280/588], Time: 3.53, Data: 0.09, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.7117
    Epoch: [40][300/588], Time: 3.53, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3294
    Epoch: [40][320/588], Time: 3.53, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3547
    Epoch: [40][340/588], Time: 3.53, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4467
    Epoch: [40][360/588], Time: 3.52, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3892
    Epoch: [40][380/588], Time: 3.52, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3190
    Epoch: [40][400/588], Time: 3.53, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3575
    Epoch: [40][420/588], Time: 3.53, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3312
    Epoch: [40][440/588], Time: 3.53, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3201
    Epoch: [40][460/588], Time: 3.53, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4096
    Epoch: [40][480/588], Time: 3.52, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3710
    Epoch: [40][500/588], Time: 3.52, Data: 0.08, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.4106
    Epoch: [40][520/588], Time: 3.52, Data: 0.07, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3141
    Epoch: [40][540/588], Time: 3.52, Data: 0.07, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3097
    Epoch: [40][560/588], Time: 3.52, Data: 0.07, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3800
    Epoch: [40][580/588], Time: 3.52, Data: 0.07, lr_sound: 1e-05, lr_frame: 1.0000000000000002e-06, loss: 0.3195
  3. here are some visualization cases on testing set(threshold is 0.5): image image i noticed the result can not focus on sounding object accurately, do you know what's the matter?
  4. BTW, can you share your training log and checkpoint to help me debug? Thanks!