facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.32k stars 6.39k forks source link

How to improve low BLEU score #430

Closed Sumegh-git closed 5 years ago

Sumegh-git commented 5 years ago

I was trying to perform a Language Translation task . I was working using the German- English model , however i had replaced the architecture to fconv(pure conv.) I replaced the train.de, test.de, train.de and their corresponding train.en , and ... files with my new version of 2 separate languages . For Pre-process: python preprocess.py --source-lang de --target-lang en --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test --destdir/home/others/17EC30042/fairseq/examples/translation/iwslt14.tokenized.de-en/tr where ~/tr will store the binarized files. For train: python train.py /home/others/17EC30042/fairseq/examples/translation/iwslt14.tokenized.de-en/tr --lr 0.25 --clip-norm 0.1 --dropout 0.1 --max-tokens 4000 --arch fconv --save-dir checkpoints/fconv For Scoring: python score.py --r /home/others/17EC30042/fairseq/examples/translation/iwslt14.tokenized.de-en/test.de --s /home/others/17EC30042/fairseq/examples/translation/iwslt14.tokenized.de-en/test.en

AND OUTPUT: Namespace(ignore_case=False, order=4, ref='/home/others/17EC30042/fairseq/examples/translation/iwslt14.tokenized.de-en/test.de', sys='/home/others/17EC30042/fairseq/examples/translation/iwslt14.tokenized.de-en/test.en') BLEU4 = 12.80, 59.3/17.6/7.1/3.6 (BP=1.000, ratio=1.017, syslen=82627, reflen=81268)

MAX Sentence Length is also around 15 words

myleott commented 5 years ago

Can you post the training log? Depending on the dataset you may need to adjust the learning rate, dropout, model architecture (number of layers, embedding dimensionality, etc.).

Sumegh-git commented 5 years ago

Ok posting . Should i sent the entire log , including details of all epochs?

Sumegh-git commented 5 years ago

Namespace(arch='fconv', bucket_cap_mb=150, clip_norm=0.1, criterion='cross_entropy', data=['/home/others/17EC30042/fairseq/examples/translation/iwslt14.tokenized.de-en/tr'], ddp_backend='c10d', decoder_attention='True', decoder_embed_dim=512, decoder_embed_path=None, decoder_layers='[(512, 3)] 20', decoder_out_embed_dim=256, device_id=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.2, encoder_embed_dim=512, encoder_embed_path=None, encoder_layers='[(512, 3)] 20', fix_batches_to_gpus=False, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_interval_updates=-1, left_pad_source='True', left_pad_target='False', log_format=None, log_interval=1000, lr=[0.25], lr_scheduler='reduce_lr_on_plateau', lr_shrink=0.1, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=4000, max_update=0, min_loss_scale=0.0001, min_lr=1e-05, momentum=0.99, no_epoch_checkpoints=False, no_progress_bar=False, no_save=False, optimizer='nag', optimizer_overrides='{}', raw_text=False, reset_lr_scheduler=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='checkpoints/fconv', save_interval=1, save_interval_updates=0, seed=1, sentence_avg=False, share_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, target_lang=None, task='translation', train_subset='train', update_freq=[1], upsample_primary=1, valid_subset='valid', validate_interval=1, weight_decay=0.0) | [de] dictionary: 24808 types | [en] dictionary: 24456 types | /home/others/17EC30042/fairseq/examples/translation/iwslt14.tokenized.de-en/tr train 6083 examples | /home/others/17EC30042/fairseq/examples/translation/iwslt14.tokenized.de-en/tr valid 6083 examples | model fconv, criterion CrossEntropyCriterion | num. model params: 107025680 | training on 1 GPUs | max tokens per GPU = 4000 and max sentences per GPU = None /home/others/17EC30042/anaconda2/envs/work/lib/python3.6/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) | epoch 001: 3%| | 1/35 [00:01<00:45, 1.33s/it, loss=14.592, ppl=24688.68, wps=1087, ups=0.3, wpb=3282, bsz=280, num_updates=1, lr=0.25, gnorm=0.340, clip=10| epoch 001: 6%| | 2/35 [00:02<00:43, 1.33s/it, loss=14.585, ppl=24581.68, wps=2857, ups=0.5, wpb=3550, bsz=280, num_updates=2, lr=0.25, gnorm=0.324, clip=10| epoch 001: 9%| | 3/35 [00:03<00:40, 1.26s/it, loss=14.582, ppl=24521.72, wps=2412, ups=0.6, wpb=3045, bsz=227, num_updates=3, lr=0.25, gnorm=0.325, clip=10| epoch 001: 11%| | 4/35 [00:05<00:39, 1.29s/it, loss=14.564, ppl=24216.27, wps=2472, ups=0.6, wpb=3152, bsz=246, num_updates=4, lr=0.25, gnorm=0.330, clip=10| epoch 001: 14%|▏| 5/35 [00:06<00:37, 1.26s/it, loss=14.556, ppl=24088.19, wps=2478, ups=0.6, wpb=3127, bsz=229, num_updates=5, lr=0.25, gnorm=0.320, clip=10| epoch 001: 17%|▏| 6/35 [00:07<00:35, 1.24s/it, loss=14.552, ppl=24027.31, wps=2410, ups=0.7, wpb=3024, bsz=207, num_updates=6, lr=0.25, gnorm=0.311, clip=10| epoch 001: 20%|▏| 7/35 [00:08<00:33, 1.21s/it, loss=14.532, ppl=23686.81, wps=2391, ups=0.7, wpb=2966, bsz=213, num_updates=7, lr=0.25, gnorm=0.324, clip=10| epoch 001: 23%|▏| 8/35 [00:09<00:33, 1.23s/it, loss=14.497, ppl=23119.69, wps=2459, ups=0.7, wpb=3049, bsz=217, num_updates=8, lr=0.25, gnorm=0.337, clip=10| epoch 001: 26%|▎| 9/35 [00:10<00:30, 1.18s/it, loss=14.480, ppl=22850.20, wps=2314, ups=0.7, wpb=2848, bsz=205, num_updates=9, lr=0.25, gnorm=0.369, clip=10| epoch 001: 29%|▎| 10/35 [00:12<00:29, 1.17s/it, loss=14.446, ppl=22311.89, wps=2291, ups=0.7, wpb=2800, bsz=206, num_updates=10, lr=0.25, gnorm=0.396, clip=| epoch 001: 31%|▎| 11/35 [00:13<00:27, 1.14s/it, loss=14.403, ppl=21662.75, wps=2322, ups=0.7, wpb=2800, bsz=203, num_updates=11, lr=0.25, gnorm=0.410, clip=| epoch 001: 34%|▎| 12/35 [00:14<00:26, 1.15s/it, loss=14.379, ppl=21301.59, wps=2308, ups=0.7, wpb=2779, bsz=197, num_updates=12, lr=0.25, gnorm=0.416, clip=| epoch 001: 37%|▎| 13/35 [00:15<00:25, 1.16s/it, loss=14.373, ppl=21218.04, wps=2333, ups=0.8, wpb=2804, bsz=196, num_updates=13, lr=0.25, gnorm=0.702, clip=| epoch 001: 40%|▍| 14/35 [00:16<00:23, 1.11s/it, loss=14.382, ppl=21355.51, wps=2242, ups=0.8, wpb=2669, bsz=186, num_updates=14, lr=0.25, gnorm=1.165, clip=| epoch 001: 43%|▍| 15/35 [00:17<00:23, 1.17s/it, loss=14.460, ppl=22534.31, wps=2293, ups=0.8, wpb=2742, bsz=193, num_updates=15, lr=0.25, gnorm=1.693, clip=| epoch 001: 46%|▍| 16/35 [00:18<00:21, 1.12s/it, loss=14.460, ppl=22541.32, wps=2223, ups=0.8, wpb=2638, bsz=187, num_updates=16, lr=0.25, gnorm=2.016, clip=| epoch 001: 49%|▍| 17/35 [00:19<00:19, 1.10s/it, loss=14.469, ppl=22677.04, wps=2172, ups=0.8, wpb=2566, bsz=184, num_updates=17, lr=0.25, gnorm=2.351, clip=| epoch 001: 51%|▌| 18/35 [00:20<00:17, 1.01s/it, loss=14.466, ppl=22629.19, wps=2173, ups=0.8, wpb=2518, bsz=180, num_updates=18, lr=0.25, gnorm=2.534, clip=| epoch 001: 54%|▌| 19/35 [00:21<00:16, 1.06s/it, loss=14.456, ppl=22474.36, wps=2170, ups=0.8, wpb=2519, bsz=178, num_updates=19, lr=0.25, gnorm=2.658, clip=| epoch 001: 57%|▌| 20/35 [00:22<00:15, 1.06s/it, loss=14.418, ppl=21887.31, wps=2189, ups=0.8, wpb=2529, bsz=180, num_updates=20, lr=0.25, gnorm=2.559, clip=| epoch 001: 60%|▌| 21/35 [00:24<00:15, 1.10s/it, loss=14.376, ppl=21266.67, wps=2211, ups=0.8, wpb=2556, bsz=179, num_updates=21, lr=0.25, gnorm=2.467, clip=| epoch 001: 63%|▋| 22/35 [00:25<00:14, 1.14s/it, loss=14.333, ppl=20633.69, wps=2246, ups=0.8, wpb=2603, bsz=181, num_updates=22, lr=0.25, gnorm=2.391, clip=| epoch 001: 66%|▋| 23/35 [00:26<00:12, 1.01s/it, loss=14.330, ppl=20592.87, wps=2220, ups=0.8, wpb=2531, bsz=174, num_updates=23, lr=0.25, gnorm=2.324, clip=| epoch 001: 69%|▋| 24/35 [00:27<00:11, 1.04s/it, loss=14.317, ppl=20416.16, wps=2183, ups=0.8, wpb=2487, bsz=171, num_updates=24, lr=0.25, gnorm=2.266, clip=| epoch 001: 71%|▋| 25/35 [00:28<00:10, 1.09s/it, loss=14.299, ppl=20151.43, wps=2191, ups=0.8, wpb=2502, bsz=170, num_updates=25, lr=0.25, gnorm=2.202, clip=| epoch 001: 74%|▋| 26/35 [00:29<00:09, 1.07s/it, loss=14.291, ppl=20045.85, wps=2141, ups=0.8, wpb=2440, bsz=167, num_updates=26, lr=0.25, gnorm=2.163, clip=| epoch 001: 77%|▊| 27/35 [00:30<00:08, 1.09s/it, loss=14.273, ppl=19793.06, wps=2137, ups=0.8, wpb=2435, bsz=168, num_updates=27, lr=0.25, gnorm=2.116, clip=| epoch 001: 80%|▊| 28/35 [00:31<00:07, 1.13s/it, loss=14.244, ppl=19404.57, wps=2165, ups=0.8, wpb=2470, bsz=172, num_updates=28, lr=0.25, gnorm=2.073, clip=| epoch 001: 83%|▊| 29/35 [00:32<00:06, 1.15s/it, loss=14.234, ppl=19272.83, wps=2155, ups=0.8, wpb=2463, bsz=168, num_updates=29, lr=0.25, gnorm=2.017, clip=| epoch 001: 86%|▊| 30/35 [00:34<00:05, 1.16s/it, loss=14.213, ppl=18990.56, wps=2158, ups=0.8, wpb=2469, bsz=166, num_updates=30, lr=0.25, gnorm=1.975, clip=| epoch 001: 89%|▉| 31/35 [00:35<00:04, 1.20s/it, loss=14.181, ppl=18577.47, wps=2170, ups=0.8, wpb=2493, bsz=169, num_updates=31, lr=0.25, gnorm=1.952, clip=| epoch 001: 91%|▉| 32/35 [00:36<00:03, 1.18s/it, loss=14.153, ppl=18221.52, wps=2187, ups=0.8, wpb=2512, bsz=170, num_updates=32, lr=0.25, gnorm=1.917, clip=| epoch 001: 94%|▉| 33/35 [00:37<00:02, 1.23s/it, loss=14.109, ppl=17674.41, wps=2206, ups=0.8, wpb=2545, bsz=173, num_updates=33, lr=0.25, gnorm=1.881, clip=| epoch 001: 97%|▉| 34/35 [00:38<00:01, 1.18s/it, loss=14.094, ppl=17488.51, wps=2191, ups=0.8, wpb=2524, bsz=171, num_updates=34, lr=0.25, gnorm=1.839, clip=| epoch 001: 100%|█| 35/35 [00:40<00:00, 1.21s/it, loss=14.059, ppl=17072.17, wps=2217, ups=0.8, wpb=2560, bsz=174, num_updates=35, lr=0.25, gnorm=1.800, clip= | epoch 001 | loss 14.059 | ppl 17072.17 | wps 2217 | ups 0.8 | wpb 2560 | bsz 174 | num_updates 35 | lr 0.25 | gnorm 1.800 | clip 100% | oom 0 | wall 42 | train_wall 39 | epoch 001 | valid on 'valid' subset | valid_loss 13.1245 | valid_ppl 8930.17 | num_updates 35 | epoch 002: 3%| | 1/35 [00:00<00:33, 1.01it/s, loss=13.392, ppl=10752.30, wps=25, ups=0.0, wpb=1068, bsz=104, num_updates=36, lr=0.25, gnorm=0.764, clip=100| epoch 002: 6%| | 2/35 [00:01<00:29, 1.10it/s, loss=13.608, ppl=12486.02, wps=1340, ups=0.0, wpb=1013, bsz=62, num_updates=37, lr=0.25, gnorm=0.779, clip=10| epoch 002: 9%| | 3/35 [00:02<00:32, 1.02s/it, loss=13.333, ppl=10319.86, wps=2087, ups=0.1, wpb=1749, bsz=129, num_updates=38, lr=0.25, gnorm=0.638, clip=1| epoch 002: 11%| | 4/35 [00:04<00:34, 1.12s/it, loss=13.176, ppl=9253.90, wps=2332, ups=0.1, wpb=2213, bsz=159, num_updates=39, lr=0.25, gnorm=0.606, clip=10| epoch 002: 14%|▏| 5/35 [00:05<00:32, 1.09s/it, loss=13.172, ppl=9227.57, wps=1981, ups=0.1, wpb=1946, bsz=145, num_updates=40, lr=0.25, gnorm=0.604, clip=10| epoch 002: 17%|▏| 6/35 [00:06<00:33, 1.14s/it, loss=13.067, ppl=8580.77, wps=2179, ups=0.1, wpb=2220, bsz=156, num_updates=41, lr=0.25, gnorm=0.595, clip=10| epoch 002: 20%|▏| 7/35 [00:07<00:32, 1.17s/it, loss=13.051, ppl=8487.06, wps=2333, ups=0.1, wpb=2442, bsz=171, num_updates=42, lr=0.25, gnorm=0.564, clip=10| epoch 002: 23%|▏| 8/35 [00:08<00:30, 1.15s/it, loss=13.034, ppl=8384.77, wps=2270, ups=0.2, wpb=2391, bsz=164, num_updates=43, lr=0.25, gnorm=0.564, clip=10| epoch 002: 26%|▎| 9/35 [00:10<00:30, 1.19s/it, loss=13.040, ppl=8423.64, wps=2309, ups=0.2, wpb=2490, bsz=177, num_updates=44, lr=0.25, gnorm=0.554, clip=10| epoch 002: 29%|▎| 10/35 [00:11<00:29, 1.19s/it, loss=13.072, ppl=8609.34, wps=2287, ups=0.2, wpb=2492, bsz=169, num_updates=45, lr=0.25, gnorm=0.539, clip=1| epoch 002: 31%|▎| 11/35 [00:12<00:27, 1.15s/it, loss=13.061, ppl=8544.97, wps=2313, ups=0.2, wpb=2512, bsz=173, num_updates=46, lr=0.25, gnorm=0.536, clip=1| epoch 002: 34%|▎| 12/35 [00:13<00:26, 1.16s/it, loss=12.998, ppl=8181.04, wps=2341, ups=0.2, wpb=2561, bsz=173, num_updates=47, lr=0.25, gnorm=0.538, clip=1| epoch 002: 37%|▎| 13/35 [00:14<00:25, 1.17s/it, loss=13.030, ppl=8364.68, wps=2301, ups=0.2, wpb=2538, bsz=165, num_updates=48, lr=0.25, gnorm=0.542, clip=1| epoch 002: 40%|▍| 14/35 [00:15<00:24, 1.16s/it, loss=13.037, ppl=8405.61, wps=2288, ups=0.2, wpb=2526, bsz=168, num_updates=49, lr=0.25, gnorm=0.544, clip=1| epoch 002: 43%|▍| 15/35 [00:17<00:23, 1.17s/it, loss=13.037, ppl=8404.74, wps=2304, ups=0.3, wpb=2559, bsz=168, num_updates=50, lr=0.25, gnorm=0.554, clip=1| epoch 002: 46%|▍| 16/35 [00:18<00:21, 1.14s/it, loss=13.000, ppl=8193.48, wps=2324, ups=0.3, wpb=2575, bsz=168, num_updates=51, lr=0.25, gnorm=0.548, clip=1| epoch 002: 49%|▍| 17/35 [00:19<00:20, 1.14s/it, loss=13.002, ppl=8205.11, wps=2307, ups=0.3, wpb=2559, bsz=170, num_updates=52, lr=0.25, gnorm=0.549, clip=1| epoch 002: 51%|▌| 18/35 [00:20<00:20, 1.19s/it, loss=12.980, ppl=8080.34, wps=2345, ups=0.3, wpb=2629, bsz=176, num_updates=53, lr=0.25, gnorm=0.541, clip=1| epoch 002: 54%|▌| 19/35 [00:21<00:19, 1.20s/it, loss=12.964, ppl=7988.82, wps=2374, ups=0.3, wpb=2671, bsz=180, num_updates=54, lr=0.25, gnorm=0.536, clip=1| epoch 002: 57%|▌| 20/35 [00:22<00:17, 1.16s/it, loss=12.965, ppl=7997.43, wps=2315, ups=0.3, wpb=2599, bsz=177, num_updates=55, lr=0.25, gnorm=0.540, clip=1| epoch 002: 60%|▌| 21/35 [00:24<00:16, 1.17s/it, loss=12.962, ppl=7980.97, wps=2310, ups=0.3, wpb=2601, bsz=174, num_updates=56, lr=0.25, gnorm=0.539, clip=1| epoch 002: 63%|▋| 22/35 [00:25<00:14, 1.14s/it, loss=12.962, ppl=7981.06, wps=2267, ups=0.3, wpb=2547, bsz=172, num_updates=57, lr=0.25, gnorm=0.532, clip=1| epoch 002: 66%|▋| 23/35 [00:26<00:13, 1.14s/it, loss=12.957, ppl=7952.01, wps=2264, ups=0.3, wpb=2547, bsz=170, num_updates=58, lr=0.25, gnorm=0.532, clip=1| epoch 002: 69%|▋| 24/35 [00:27<00:12, 1.16s/it, loss=12.940, ppl=7860.53, wps=2258, ups=0.3, wpb=2546, bsz=168, num_updates=59, lr=0.25, gnorm=0.537, clip=1| epoch 002: 71%|▋| 25/35 [00:28<00:12, 1.21s/it, loss=12.933, ppl=7822.42, wps=2273, ups=0.4, wpb=2583, bsz=174, num_updates=60, lr=0.25, gnorm=0.527, clip=1| epoch 002: 74%|▋| 26/35 [00:30<00:10, 1.21s/it, loss=12.898, ppl=7631.68, wps=2286, ups=0.4, wpb=2604, bsz=174, num_updates=61, lr=0.25, gnorm=0.530, clip=1| epoch 002: 77%|▊| 27/35 [00:31<00:09, 1.19s/it, loss=12.882, ppl=7550.13, wps=2301, ups=0.4, wpb=2622, bsz=175, num_updates=62, lr=0.25, gnorm=0.523, clip=1| epoch 002: 80%|▊| 28/35 [00:32<00:08, 1.23s/it, loss=12.865, ppl=7459.66, wps=2326, ups=0.4, wpb=2663, bsz=179, num_updates=63, lr=0.25, gnorm=0.514, clip=1| epoch 002: 83%|▊| 29/35 [00:33<00:07, 1.24s/it, loss=12.838, ppl=7322.83, wps=2346, ups=0.4, wpb=2696, bsz=181, num_updates=64, lr=0.25, gnorm=0.513, clip=1| epoch 002: 86%|▊| 30/35 [00:35<00:06, 1.23s/it, loss=12.835, ppl=7307.26, wps=2347, ups=0.4, wpb=2702, bsz=180, num_updates=65, lr=0.25, gnorm=0.506, clip=1| epoch 002: 89%|▉| 31/35 [00:36<00:04, 1.19s/it, loss=12.834, ppl=7303.71, wps=2316, ups=0.4, wpb=2662, bsz=178, num_updates=66, lr=0.25, gnorm=0.510, clip=1| epoch 002: 91%|▉| 32/35 [00:37<00:03, 1.18s/it, loss=12.843, ppl=7346.91, wps=2315, ups=0.4, wpb=2661, bsz=180, num_updates=67, lr=0.25, gnorm=0.510, clip=1| epoch 002: 94%|▉| 33/35 [00:38<00:02, 1.15s/it, loss=12.837, ppl=7315.44, wps=2297, ups=0.4, wpb=2635, bsz=179, num_updates=68, lr=0.25, gnorm=0.510, clip=1| epoch 002: 97%|▉| 34/35 [00:39<00:01, 1.04s/it, loss=12.831, ppl=7284.01, wps=2294, ups=0.4, wpb=2608, bsz=177, num_updates=69, lr=0.25, gnorm=0.508, clip=1| epoch 002: 100%|█| 35/35 [00:40<00:00, 1.02s/it, loss=12.828, ppl=7273.30, wps=2260, ups=0.4, wpb=2560, bsz=174, num_updates=70, lr=0.25, gnorm=0.511, clip=1 | epoch 002 | loss 12.828 | ppl 7273.30 | wps 2260 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 70 | lr 0.25 | gnorm 0.511 | clip 100% | oom 0 | wall 124 | train_wall 79 | epoch 002 | valid on 'valid' subset | valid_loss 12.4457 | valid_ppl 5578.61 | num_updates 70 | best 12.4457 | epoch 003: 3%| | 1/35 [00:01<00:42, 1.25s/it, loss=12.245, ppl=4852.87, wps=94, ups=0.0, wpb=3775, bsz=256, num_updates=71, lr=0.25, gnorm=0.367, clip=100%| epoch 003: 6%| | 2/35 [00:02<00:42, 1.28s/it, loss=12.181, ppl=4642.57, wps=2711, ups=0.0, wpb=3690, bsz=252, num_updates=72, lr=0.25, gnorm=0.363, clip=10| epoch 003: 9%| | 3/35 [00:03<00:38, 1.20s/it, loss=12.225, ppl=4785.77, wps=1896, ups=0.1, wpb=2753, bsz=197, num_updates=73, lr=0.25, gnorm=0.429, clip=10| epoch 003: 11%| | 4/35 [00:04<00:37, 1.22s/it, loss=12.146, ppl=4530.80, wps=2235, ups=0.1, wpb=2962, bsz=202, num_updates=74, lr=0.25, gnorm=0.473, clip=10| epoch 003: 14%|▏| 5/35 [00:06<00:36, 1.21s/it, loss=12.307, ppl=5068.39, wps=2145, ups=0.1, wpb=2823, bsz=174, num_updates=75, lr=0.25, gnorm=0.518, clip=10| epoch 003: 17%|▏| 6/35 [00:07<00:33, 1.17s/it, loss=12.274, ppl=4953.51, wps=2234, ups=0.1, wpb=2820, bsz=173, num_updates=76, lr=0.25, gnorm=0.508, clip=10| epoch 003: 20%|▏| 7/35 [00:08<00:33, 1.18s/it, loss=12.327, ppl=5137.23, wps=2279, ups=0.1, wpb=2850, bsz=171, num_updates=77, lr=0.25, gnorm=0.478, clip=10| epoch 003: 23%|▏| 8/35 [00:09<00:31, 1.16s/it, loss=12.347, ppl=5208.11, wps=2153, ups=0.2, wpb=2677, bsz=164, num_updates=78, lr=0.25, gnorm=0.470, clip=10| epoch 003: 26%|▎| 9/35 [00:10<00:29, 1.15s/it, loss=12.374, ppl=5307.84, wps=2138, ups=0.2, wpb=2636, bsz=168, num_updates=79, lr=0.25, gnorm=0.461, clip=10| epoch 003: 29%|▎| 10/35 [00:11<00:27, 1.10s/it, loss=12.377, ppl=5320.90, wps=2025, ups=0.2, wpb=2462, bsz=158, num_updates=80, lr=0.25, gnorm=0.471, clip=1| epoch 003: 31%|▎| 11/35 [00:12<00:26, 1.11s/it, loss=12.358, ppl=5249.11, wps=2092, ups=0.2, wpb=2521, bsz=161, num_updates=81, lr=0.25, gnorm=0.463, clip=1| epoch 003: 34%|▎| 12/35 [00:13<00:26, 1.16s/it, loss=12.316, ppl=5100.45, wps=2168, ups=0.2, wpb=2613, bsz=169, num_updates=82, lr=0.25, gnorm=0.450, clip=1| epoch 003: 37%|▎| 13/35 [00:15<00:25, 1.18s/it, loss=12.336, ppl=5169.68, wps=2186, ups=0.2, wpb=2633, bsz=167, num_updates=83, lr=0.25, gnorm=0.458, clip=1| epoch 003: 40%|▍| 14/35 [00:16<00:25, 1.22s/it, loss=12.325, ppl=5130.25, wps=2246, ups=0.3, wpb=2718, bsz=175, num_updates=84, lr=0.25, gnorm=0.453, clip=1| epoch 003: 43%|▍| 15/35 [00:17<00:21, 1.07s/it, loss=12.344, ppl=5197.15, wps=2205, ups=0.3, wpb=2600, bsz=164, num_updates=85, lr=0.25, gnorm=0.480, clip=1| epoch 003: 46%|▍| 16/35 [00:18<00:21, 1.11s/it, loss=12.329, ppl=5146.40, wps=2200, ups=0.3, wpb=2596, bsz=163, num_updates=86, lr=0.25, gnorm=0.470, clip=1| epoch 003: 49%|▍| 17/35 [00:19<00:20, 1.12s/it, loss=12.361, ppl=5259.83, wps=2205, ups=0.3, wpb=2597, bsz=168, num_updates=87, lr=0.25, gnorm=0.471, clip=1| epoch 003: 51%|▌| 18/35 [00:20<00:17, 1.02s/it, loss=12.357, ppl=5247.41, wps=2204, ups=0.3, wpb=2548, bsz=166, num_updates=88, lr=0.25, gnorm=0.465, clip=1| epoch 003: 54%|▌| 19/35 [00:21<00:17, 1.10s/it, loss=12.342, ppl=5191.46, wps=2248, ups=0.3, wpb=2612, bsz=172, num_updates=89, lr=0.25, gnorm=0.458, clip=1| epoch 003: 57%|▌| 20/35 [00:22<00:16, 1.09s/it, loss=12.348, ppl=5213.47, wps=2203, ups=0.3, wpb=2553, bsz=169, num_updates=90, lr=0.25, gnorm=0.459, clip=1| epoch 003: 60%|▌| 21/35 [00:23<00:15, 1.08s/it, loss=12.357, ppl=5245.05, wps=2154, ups=0.3, wpb=2490, bsz=167, num_updates=91, lr=0.25, gnorm=0.470, clip=1| epoch 003: 63%|▋| 22/35 [00:24<00:14, 1.11s/it, loss=12.358, ppl=5249.11, wps=2155, ups=0.3, wpb=2492, bsz=165, num_updates=92, lr=0.25, gnorm=0.465, clip=1| epoch 003: 66%|▋| 23/35 [00:25<00:12, 1.07s/it, loss=12.361, ppl=5260.89, wps=2111, ups=0.4, wpb=2431, bsz=162, num_updates=93, lr=0.25, gnorm=0.466, clip=1| epoch 003: 69%|▋| 24/35 [00:27<00:12, 1.11s/it, loss=12.364, ppl=5271.40, wps=2117, ups=0.4, wpb=2439, bsz=160, num_updates=94, lr=0.25, gnorm=0.469, clip=1| epoch 003: 71%|▋| 25/35 [00:28<00:11, 1.13s/it, loss=12.331, ppl=5151.91, wps=2139, ups=0.4, wpb=2466, bsz=161, num_updates=95, lr=0.25, gnorm=0.489, clip=1| epoch 003: 74%|▋| 26/35 [00:29<00:10, 1.15s/it, loss=12.344, ppl=5197.17, wps=2138, ups=0.4, wpb=2468, bsz=159, num_updates=96, lr=0.25, gnorm=0.486, clip=1| epoch 003: 77%|▊| 27/35 [00:30<00:09, 1.19s/it, loss=12.340, ppl=5185.74, wps=2155, ups=0.4, wpb=2498, bsz=163, num_updates=97, lr=0.25, gnorm=0.488, clip=1| epoch 003: 80%|▊| 28/35 [00:31<00:08, 1.20s/it, loss=12.332, ppl=5155.08, wps=2182, ups=0.4, wpb=2531, bsz=167, num_updates=98, lr=0.25, gnorm=0.507, clip=1| epoch 003: 83%|▊| 29/35 [00:33<00:06, 1.16s/it, loss=12.322, ppl=5119.64, wps=2172, ups=0.4, wpb=2514, bsz=165, num_updates=99, lr=0.25, gnorm=0.513, clip=1| epoch 003: 86%|▊| 30/35 [00:34<00:06, 1.22s/it, loss=12.324, ppl=5127.09, wps=2188, ups=0.4, wpb=2545, bsz=170, num_updates=100, lr=0.25, gnorm=0.520, clip=| epoch 003: 89%|▉| 31/35 [00:35<00:04, 1.24s/it, loss=12.318, ppl=5107.81, wps=2200, ups=0.4, wpb=2567, bsz=173, num_updates=101, lr=0.25, gnorm=0.526, clip=| epoch 003: 91%|▉| 32/35 [00:36<00:03, 1.19s/it, loss=12.320, ppl=5113.87, wps=2184, ups=0.4, wpb=2544, bsz=171, num_updates=102, lr=0.25, gnorm=0.593, clip=| epoch 003: 94%|▉| 33/35 [00:37<00:02, 1.19s/it, loss=12.296, ppl=5029.91, wps=2198, ups=0.4, wpb=2560, bsz=171, num_updates=103, lr=0.25, gnorm=0.604, clip=| epoch 003: 97%|▉| 34/35 [00:39<00:01, 1.17s/it, loss=12.301, ppl=5047.38, wps=2195, ups=0.4, wpb=2555, bsz=173, num_updates=104, lr=0.25, gnorm=0.598, clip=| epoch 003: 100%|█| 35/35 [00:40<00:00, 1.14s/it, loss=12.294, ppl=5021.39, wps=2205, ups=0.4, wpb=2560, bsz=174, num_updates=105, lr=0.25, gnorm=0.591, clip= | epoch 003 | loss 12.294 | ppl 5021.39 | wps 2205 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 105 | lr 0.25 | gnorm 0.591 | clip 100% | oom 0 | wall 203 | train_wall 118 | epoch 003 | valid on 'valid' subset | valid_loss 12.2308 | valid_ppl 4806.76 | num_updates 105 | best 12.2308 | epoch 004: 3%| | 1/35 [00:01<00:40, 1.18s/it, loss=12.503, ppl=5804.04, wps=61, ups=0.0, wpb=2637, bsz=112, num_updates=106, lr=0.25, gnorm=1.194, clip=100| epoch 004: 6%| | 2/35 [00:02<00:40, 1.22s/it, loss=12.075, ppl=4314.57, wps=2726, ups=0.0, wpb=3120, bsz=180, num_updates=107, lr=0.25, gnorm=0.943, clip=1| epoch 004: 9%| | 3/35 [00:03<00:34, 1.09s/it, loss=12.074, ppl=4310.37, wps=2522, ups=0.1, wpb=2650, bsz=163, num_updates=108, lr=0.25, gnorm=0.924, clip=1| epoch 004: 11%| | 4/35 [00:04<00:35, 1.15s/it, loss=12.111, ppl=4423.92, wps=2505, ups=0.1, wpb=2793, bsz=188, num_updates=109, lr=0.25, gnorm=0.939, clip=1| epoch 004: 14%|▏| 5/35 [00:05<00:35, 1.18s/it, loss=11.992, ppl=4074.40, wps=2604, ups=0.1, wpb=2952, bsz=194, num_updates=110, lr=0.25, gnorm=0.956, clip=1| epoch 004: 17%|▏| 6/35 [00:06<00:33, 1.14s/it, loss=11.991, ppl=4071.76, wps=2599, ups=0.1, wpb=2913, bsz=196, num_updates=111, lr=0.25, gnorm=0.843, clip=1| epoch 004: 20%|▏| 7/35 [00:08<00:31, 1.14s/it, loss=12.078, ppl=4323.32, wps=2549, ups=0.1, wpb=2870, bsz=203, num_updates=112, lr=0.25, gnorm=0.786, clip=1| epoch 004: 23%|▏| 8/35 [00:09<00:31, 1.18s/it, loss=12.011, ppl=4128.29, wps=2596, ups=0.2, wpb=2966, bsz=209, num_updates=113, lr=0.25, gnorm=0.752, clip=1| epoch 004: 26%|▎| 9/35 [00:10<00:30, 1.18s/it, loss=11.960, ppl=3984.05, wps=2597, ups=0.2, wpb=2979, bsz=205, num_updates=114, lr=0.25, gnorm=0.734, clip=1| epoch 004: 29%|▎| 10/35 [00:11<00:30, 1.22s/it, loss=11.955, ppl=3971.36, wps=2633, ups=0.2, wpb=3059, bsz=213, num_updates=115, lr=0.25, gnorm=0.707, clip=| epoch 004: 31%|▎| 11/35 [00:12<00:28, 1.21s/it, loss=12.003, ppl=4105.12, wps=2582, ups=0.2, wpb=3009, bsz=202, num_updates=116, lr=0.25, gnorm=0.707, clip=| epoch 004: 34%|▎| 12/35 [00:14<00:27, 1.18s/it, loss=12.028, ppl=4176.91, wps=2536, ups=0.2, wpb=2950, bsz=202, num_updates=117, lr=0.25, gnorm=0.675, clip=| epoch 004: 37%|▎| 13/35 [00:15<00:25, 1.15s/it, loss=12.042, ppl=4217.82, wps=2445, ups=0.2, wpb=2832, bsz=196, num_updates=118, lr=0.25, gnorm=0.659, clip=| epoch 004: 40%|▍| 14/35 [00:16<00:24, 1.15s/it, loss=12.052, ppl=4247.72, wps=2424, ups=0.2, wpb=2812, bsz=191, num_updates=119, lr=0.25, gnorm=0.640, clip=| epoch 004: 43%|▍| 15/35 [00:17<00:20, 1.02s/it, loss=12.074, ppl=4311.88, wps=2375, ups=0.3, wpb=2688, bsz=180, num_updates=120, lr=0.25, gnorm=0.816, clip=| epoch 004: 46%|▍| 16/35 [00:18<00:20, 1.05s/it, loss=12.092, ppl=4366.51, wps=2357, ups=0.3, wpb=2668, bsz=182, num_updates=121, lr=0.25, gnorm=0.793, clip=| epoch 004: 49%|▍| 17/35 [00:19<00:19, 1.09s/it, loss=12.085, ppl=4345.57, wps=2377, ups=0.3, wpb=2694, bsz=183, num_updates=122, lr=0.25, gnorm=0.840, clip=| epoch 004: 51%|▌| 18/35 [00:20<00:19, 1.15s/it, loss=12.090, ppl=4360.23, wps=2389, ups=0.3, wpb=2726, bsz=189, num_updates=123, lr=0.25, gnorm=0.837, clip=| epoch 004: 54%|▌| 19/35 [00:21<00:17, 1.10s/it, loss=12.091, ppl=4363.73, wps=2318, ups=0.3, wpb=2630, bsz=182, num_updates=124, lr=0.25, gnorm=0.822, clip=| epoch 004: 57%|▌| 20/35 [00:22<00:16, 1.09s/it, loss=12.081, ppl=4333.17, wps=2295, ups=0.3, wpb=2601, bsz=179, num_updates=125, lr=0.25, gnorm=0.803, clip=| epoch 004: 60%|▌| 21/35 [00:23<00:15, 1.12s/it, loss=12.046, ppl=4227.69, wps=2312, ups=0.3, wpb=2625, bsz=179, num_updates=126, lr=0.25, gnorm=0.788, clip=| epoch 004: 63%|▋| 22/35 [00:24<00:14, 1.11s/it, loss=12.053, ppl=4248.69, wps=2268, ups=0.3, wpb=2572, bsz=176, num_updates=127, lr=0.25, gnorm=0.774, clip=| epoch 004: 66%|▋| 23/35 [00:26<00:13, 1.09s/it, loss=12.057, ppl=4261.94, wps=2208, ups=0.3, wpb=2499, bsz=172, num_updates=128, lr=0.25, gnorm=0.773, clip=| epoch 004: 69%|▋| 24/35 [00:27<00:12, 1.14s/it, loss=12.034, ppl=4193.95, wps=2246, ups=0.3, wpb=2552, bsz=176, num_updates=129, lr=0.25, gnorm=0.754, clip=| epoch 004: 71%|▋| 25/35 [00:28<00:11, 1.12s/it, loss=12.045, ppl=4225.78, wps=2203, ups=0.4, wpb=2499, bsz=173, num_updates=130, lr=0.25, gnorm=0.753, clip=| epoch 004: 74%|▋| 26/35 [00:29<00:10, 1.15s/it, loss=12.055, ppl=4255.46, wps=2210, ups=0.4, wpb=2514, bsz=172, num_updates=131, lr=0.25, gnorm=0.737, clip=| epoch 004: 77%|▊| 27/35 [00:30<00:09, 1.17s/it, loss=12.065, ppl=4285.06, wps=2222, ups=0.4, wpb=2533, bsz=172, num_updates=132, lr=0.25, gnorm=0.722, clip=| epoch 004: 80%|▊| 28/35 [00:31<00:08, 1.18s/it, loss=12.092, ppl=4366.24, wps=2209, ups=0.4, wpb=2523, bsz=168, num_updates=133, lr=0.25, gnorm=0.722, clip=| epoch 004: 83%|▊| 29/35 [00:33<00:07, 1.18s/it, loss=12.082, ppl=4335.81, wps=2206, ups=0.4, wpb=2523, bsz=167, num_updates=134, lr=0.25, gnorm=0.711, clip=| epoch 004: 86%|▊| 30/35 [00:34<00:05, 1.19s/it, loss=12.080, ppl=4329.45, wps=2228, ups=0.4, wpb=2553, bsz=170, num_updates=135, lr=0.25, gnorm=0.705, clip=| epoch 004: 89%|▉| 31/35 [00:35<00:04, 1.13s/it, loss=12.085, ppl=4343.32, wps=2194, ups=0.4, wpb=2505, bsz=168, num_updates=136, lr=0.25, gnorm=0.705, clip=| epoch 004: 91%|▉| 32/35 [00:36<00:03, 1.12s/it, loss=12.084, ppl=4341.90, wps=2178, ups=0.4, wpb=2484, bsz=167, num_updates=137, lr=0.25, gnorm=0.694, clip=| epoch 004: 94%|▉| 33/35 [00:37<00:02, 1.19s/it, loss=12.090, ppl=4361.09, wps=2193, ups=0.4, wpb=2514, bsz=171, num_updates=138, lr=0.25, gnorm=0.688, clip=| epoch 004: 97%|▉| 34/35 [00:38<00:01, 1.15s/it, loss=12.079, ppl=4326.17, wps=2206, ups=0.4, wpb=2522, bsz=171, num_updates=139, lr=0.25, gnorm=0.696, clip=| epoch 004: 100%|█| 35/35 [00:40<00:00, 1.20s/it, loss=12.069, ppl=4296.90, wps=2228, ups=0.4, wpb=2560, bsz=174, num_updates=140, lr=0.25, gnorm=0.687, clip= | epoch 004 | loss 12.069 | ppl 4296.90 | wps 2228 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 140 | lr 0.25 | gnorm 0.687 | clip 100% | oom 0 | wall 285 | train_wall 157 | epoch 004 | valid on 'valid' subset | valid_loss 12.2392 | valid_ppl 4834.51 | num_updates 140 | best 12.2308 | epoch 005: 3%| | 1/35 [00:01<00:40, 1.20s/it, loss=11.953, ppl=3965.34, wps=103, ups=0.0, wpb=3418, bsz=264, num_updates=141, lr=0.025, gnorm=0.248, clip=1| epoch 005: 6%| | 2/35 [00:02<00:39, 1.20s/it, loss=11.872, ppl=3749.43, wps=2139, ups=0.1, wpb=2974, bsz=200, num_updates=142, lr=0.025, gnorm=0.408, clip=| epoch 005: 9%| | 3/35 [00:03<00:38, 1.19s/it, loss=11.736, ppl=3410.18, wps=2374, ups=0.1, wpb=3013, bsz=192, num_updates=143, lr=0.025, gnorm=0.470, clip=| epoch 005: 11%| | 4/35 [00:04<00:36, 1.18s/it, loss=11.938, ppl=3923.88, wps=2350, ups=0.1, wpb=2914, bsz=206, num_updates=144, lr=0.025, gnorm=0.484, clip=| epoch 005: 14%|▏| 5/35 [00:05<00:34, 1.13s/it, loss=11.955, ppl=3970.67, wps=2007, ups=0.1, wpb=2507, bsz=182, num_updates=145, lr=0.025, gnorm=0.490, clip=| epoch 005: 17%|▏| 6/35 [00:06<00:33, 1.16s/it, loss=12.010, ppl=4124.17, wps=2110, ups=0.2, wpb=2594, bsz=179, num_updates=146, lr=0.025, gnorm=0.497, clip=| epoch 005: 20%|▏| 7/35 [00:07<00:29, 1.05s/it, loss=12.000, ppl=4094.86, wps=2119, ups=0.2, wpb=2467, bsz=171, num_updates=147, lr=0.025, gnorm=0.468, clip=| epoch 005: 23%|▏| 8/35 [00:09<00:30, 1.11s/it, loss=11.920, ppl=3875.00, wps=2239, ups=0.2, wpb=2613, bsz=181, num_updates=148, lr=0.025, gnorm=0.451, clip=| epoch 005: 26%|▎| 9/35 [00:10<00:28, 1.10s/it, loss=11.940, ppl=3928.92, wps=2132, ups=0.2, wpb=2481, bsz=175, num_updates=149, lr=0.025, gnorm=0.449, clip=| epoch 005: 29%|▎| 10/35 [00:11<00:28, 1.13s/it, loss=12.024, ppl=4165.41, wps=2103, ups=0.2, wpb=2459, bsz=164, num_updates=150, lr=0.025, gnorm=0.488, clip| epoch 005: 31%|▎| 11/35 [00:12<00:26, 1.11s/it, loss=12.008, ppl=4117.79, wps=2147, ups=0.2, wpb=2483, bsz=168, num_updates=151, lr=0.025, gnorm=0.468, clip| epoch 005: 34%|▎| 12/35 [00:13<00:27, 1.18s/it, loss=11.962, ppl=3990.14, wps=2206, ups=0.3, wpb=2576, bsz=175, num_updates=152, lr=0.025, gnorm=0.453, clip| epoch 005: 37%|▎| 13/35 [00:14<00:24, 1.12s/it, loss=11.964, ppl=3995.93, wps=2113, ups=0.3, wpb=2448, bsz=166, num_updates=153, lr=0.025, gnorm=0.460, clip| epoch 005: 40%|▍| 14/35 [00:15<00:24, 1.17s/it, loss=11.972, ppl=4018.32, wps=2146, ups=0.3, wpb=2503, bsz=173, num_updates=154, lr=0.025, gnorm=0.455, clip| epoch 005: 43%|▍| 15/35 [00:17<00:23, 1.18s/it, loss=12.002, ppl=4101.33, wps=2143, ups=0.3, wpb=2503, bsz=168, num_updates=155, lr=0.025, gnorm=0.460, clip| epoch 005: 46%|▍| 16/35 [00:18<00:22, 1.16s/it, loss=12.019, ppl=4149.00, wps=2136, ups=0.3, wpb=2491, bsz=170, num_updates=156, lr=0.025, gnorm=0.457, clip| epoch 005: 49%|▍| 17/35 [00:19<00:20, 1.14s/it, loss=12.019, ppl=4149.08, wps=2110, ups=0.3, wpb=2451, bsz=168, num_updates=157, lr=0.025, gnorm=0.452, clip| epoch 005: 51%|▌| 18/35 [00:20<00:19, 1.12s/it, loss=12.025, ppl=4168.38, wps=2066, ups=0.3, wpb=2397, bsz=164, num_updates=158, lr=0.025, gnorm=0.447, clip| epoch 005: 54%|▌| 19/35 [00:21<00:17, 1.09s/it, loss=12.033, ppl=4190.32, wps=2017, ups=0.4, wpb=2327, bsz=161, num_updates=159, lr=0.025, gnorm=0.457, clip| epoch 005: 57%|▌| 20/35 [00:22<00:17, 1.16s/it, loss=12.015, ppl=4140.12, wps=2068, ups=0.4, wpb=2402, bsz=167, num_updates=160, lr=0.025, gnorm=0.450, clip| epoch 005: 60%|▌| 21/35 [00:23<00:15, 1.14s/it, loss=12.028, ppl=4174.88, wps=2025, ups=0.4, wpb=2346, bsz=165, num_updates=161, lr=0.025, gnorm=0.451, clip| epoch 005: 63%|▋| 22/35 [00:24<00:14, 1.12s/it, loss=12.015, ppl=4139.00, wps=2018, ups=0.4, wpb=2332, bsz=163, num_updates=162, lr=0.025, gnorm=0.446, clip| epoch 005: 66%|▋| 23/35 [00:26<00:13, 1.15s/it, loss=12.025, ppl=4168.30, wps=2035, ups=0.4, wpb=2355, bsz=162, num_updates=163, lr=0.025, gnorm=0.437, clip| epoch 005: 69%|▋| 24/35 [00:27<00:12, 1.15s/it, loss=12.027, ppl=4174.73, wps=2042, ups=0.4, wpb=2363, bsz=160, num_updates=164, lr=0.025, gnorm=0.430, clip| epoch 005: 71%|▋| 25/35 [00:28<00:11, 1.15s/it, loss=12.017, ppl=4144.97, wps=2070, ups=0.4, wpb=2393, bsz=162, num_updates=165, lr=0.025, gnorm=0.431, clip| epoch 005: 74%|▋| 26/35 [00:29<00:10, 1.14s/it, loss=12.030, ppl=4181.88, wps=2071, ups=0.4, wpb=2392, bsz=164, num_updates=166, lr=0.025, gnorm=0.432, clip| epoch 005: 77%|▊| 27/35 [00:30<00:09, 1.16s/it, loss=11.998, ppl=4089.69, wps=2093, ups=0.4, wpb=2418, bsz=165, num_updates=167, lr=0.025, gnorm=0.428, clip| epoch 005: 80%|▊| 28/35 [00:31<00:07, 1.13s/it, loss=11.984, ppl=4050.87, wps=2113, ups=0.4, wpb=2432, bsz=165, num_updates=168, lr=0.025, gnorm=0.434, clip| epoch 005: 83%|▊| 29/35 [00:32<00:06, 1.14s/it, loss=11.991, ppl=4071.22, wps=2117, ups=0.4, wpb=2439, bsz=163, num_updates=169, lr=0.025, gnorm=0.432, clip| epoch 005: 86%|▊| 30/35 [00:34<00:05, 1.18s/it, loss=11.972, ppl=4018.53, wps=2151, ups=0.5, wpb=2484, bsz=166, num_updates=170, lr=0.025, gnorm=0.431, clip| epoch 005: 89%|▉| 31/35 [00:35<00:04, 1.20s/it, loss=11.978, ppl=4034.24, wps=2167, ups=0.5, wpb=2510, bsz=170, num_updates=171, lr=0.025, gnorm=0.433, clip| epoch 005: 91%|▉| 32/35 [00:36<00:03, 1.06s/it, loss=11.986, ppl=4055.77, wps=2150, ups=0.5, wpb=2461, bsz=165, num_updates=172, lr=0.025, gnorm=0.446, clip| epoch 005: 94%|▉| 33/35 [00:37<00:02, 1.12s/it, loss=11.964, ppl=3994.82, wps=2175, ups=0.5, wpb=2495, bsz=167, num_updates=173, lr=0.025, gnorm=0.449, clip| epoch 005: 97%|▉| 34/35 [00:38<00:01, 1.18s/it, loss=11.973, ppl=4020.34, wps=2190, ups=0.5, wpb=2524, bsz=171, num_updates=174, lr=0.025, gnorm=0.450, clip| epoch 005: 100%|█| 35/35 [00:40<00:00, 1.22s/it, loss=11.966, ppl=4000.10, wps=2213, ups=0.5, wpb=2560, bsz=174, num_updates=175, lr=0.025, gnorm=0.444, clip | epoch 005 | loss 11.966 | ppl 4000.10 | wps 2213 | ups 0.5 | wpb 2560 | bsz 174 | num_updates 175 | lr 0.025 | gnorm 0.444 | clip 100% | oom 0 | wall 357 | train_wall 197 | epoch 005 | valid on 'valid' subset | valid_loss 12.1938 | valid_ppl 4684.97 | num_updates 175 | best 12.1938 | epoch 006: 3%| | 1/35 [00:01<00:38, 1.13s/it, loss=12.223, ppl=4780.50, wps=56, ups=0.0, wpb=2306, bsz=200, num_updates=176, lr=0.025, gnorm=0.293, clip=10| epoch 006: 6%| | 2/35 [00:02<00:38, 1.18s/it, loss=12.096, ppl=4379.12, wps=2497, ups=0.0, wpb=2764, bsz=232, num_updates=177, lr=0.025, gnorm=0.362, clip=| epoch 006: 9%| | 3/35 [00:03<00:37, 1.16s/it, loss=12.162, ppl=4581.27, wps=2318, ups=0.1, wpb=2632, bsz=227, num_updates=178, lr=0.025, gnorm=0.374, clip=| epoch 006: 11%| | 4/35 [00:04<00:37, 1.22s/it, loss=12.164, ppl=4588.36, wps=2412, ups=0.1, wpb=2842, bsz=246, num_updates=179, lr=0.025, gnorm=0.392, clip=| epoch 006: 14%|▏| 5/35 [00:06<00:37, 1.23s/it, loss=12.144, ppl=4526.72, wps=2452, ups=0.1, wpb=2930, bsz=253, num_updates=180, lr=0.025, gnorm=0.400, clip=| epoch 006: 17%|▏| 6/35 [00:07<00:34, 1.19s/it, loss=12.151, ppl=4547.76, wps=2255, ups=0.1, wpb=2686, bsz=229, num_updates=181, lr=0.025, gnorm=0.415, clip=| epoch 006: 20%|▏| 7/35 [00:08<00:33, 1.21s/it, loss=12.048, ppl=4234.11, wps=2383, ups=0.1, wpb=2842, bsz=233, num_updates=182, lr=0.025, gnorm=0.455, clip=| epoch 006: 23%|▏| 8/35 [00:09<00:32, 1.20s/it, loss=12.014, ppl=4135.76, wps=2347, ups=0.2, wpb=2803, bsz=221, num_updates=183, lr=0.025, gnorm=0.499, clip=| epoch 006: 26%|▎| 9/35 [00:10<00:30, 1.16s/it, loss=11.993, ppl=4076.12, wps=2373, ups=0.2, wpb=2794, bsz=220, num_updates=184, lr=0.025, gnorm=0.513, clip=| epoch 006: 29%|▎| 10/35 [00:11<00:29, 1.17s/it, loss=12.029, ppl=4179.19, wps=2344, ups=0.2, wpb=2765, bsz=207, num_updates=185, lr=0.025, gnorm=0.512, clip| epoch 006: 31%|▎| 11/35 [00:13<00:28, 1.17s/it, loss=12.042, ppl=4215.83, wps=2333, ups=0.2, wpb=2753, bsz=199, num_updates=186, lr=0.025, gnorm=0.517, clip| epoch 006: 34%|▎| 12/35 [00:14<00:28, 1.22s/it, loss=11.989, ppl=4064.97, wps=2369, ups=0.2, wpb=2824, bsz=203, num_updates=187, lr=0.025, gnorm=0.497, clip| epoch 006: 37%|▎| 13/35 [00:15<00:25, 1.17s/it, loss=11.994, ppl=4079.45, wps=2260, ups=0.2, wpb=2675, bsz=194, num_updates=188, lr=0.025, gnorm=0.505, clip| epoch 006: 40%|▍| 14/35 [00:16<00:24, 1.16s/it, loss=11.986, ppl=4055.52, wps=2291, ups=0.2, wpb=2705, bsz=194, num_updates=189, lr=0.025, gnorm=0.581, clip| epoch 006: 43%|▍| 15/35 [00:17<00:23, 1.16s/it, loss=12.022, ppl=4159.54, wps=2291, ups=0.3, wpb=2699, bsz=198, num_updates=190, lr=0.025, gnorm=0.563, clip| epoch 006: 46%|▍| 16/35 [00:18<00:21, 1.13s/it, loss=12.038, ppl=4204.37, wps=2223, ups=0.3, wpb=2608, bsz=192, num_updates=191, lr=0.025, gnorm=0.588, clip| epoch 006: 49%|▍| 17/35 [00:20<00:20, 1.16s/it, loss=11.995, ppl=4081.60, wps=2266, ups=0.3, wpb=2665, bsz=194, num_updates=192, lr=0.025, gnorm=0.615, clip| epoch 006: 51%|▌| 18/35 [00:21<00:19, 1.13s/it, loss=11.977, ppl=4030.51, wps=2286, ups=0.3, wpb=2673, bsz=192, num_updates=193, lr=0.025, gnorm=0.655, clip| epoch 006: 54%|▌| 19/35 [00:21<00:16, 1.01s/it, loss=11.989, ppl=4064.70, wps=2253, ups=0.3, wpb=2583, bsz=183, num_updates=194, lr=0.025, gnorm=0.665, clip| epoch 006: 57%|▌| 20/35 [00:22<00:15, 1.03s/it, loss=11.994, ppl=4079.48, wps=2208, ups=0.3, wpb=2525, bsz=181, num_updates=195, lr=0.025, gnorm=0.650, clip| epoch 006: 60%|▌| 21/35 [00:24<00:15, 1.09s/it, loss=12.004, ppl=4107.61, wps=2215, ups=0.3, wpb=2541, bsz=179, num_updates=196, lr=0.025, gnorm=0.633, clip| epoch 006: 63%|▋| 22/35 [00:25<00:13, 1.05s/it, loss=12.004, ppl=4107.76, wps=2163, ups=0.3, wpb=2467, bsz=174, num_updates=197, lr=0.025, gnorm=0.627, clip| epoch 006: 66%|▋| 23/35 [00:26<00:13, 1.13s/it, loss=11.988, ppl=4062.78, wps=2200, ups=0.3, wpb=2524, bsz=178, num_updates=198, lr=0.025, gnorm=0.612, clip| epoch 006: 69%|▋| 24/35 [00:27<00:12, 1.12s/it, loss=11.979, ppl=4037.35, wps=2186, ups=0.4, wpb=2503, bsz=176, num_updates=199, lr=0.025, gnorm=0.608, clip| epoch 006: 71%|▋| 25/35 [00:28<00:11, 1.11s/it, loss=11.980, ppl=4040.57, wps=2166, ups=0.4, wpb=2476, bsz=174, num_updates=200, lr=0.025, gnorm=0.618, clip| epoch 006: 74%|▋| 26/35 [00:29<00:10, 1.13s/it, loss=11.984, ppl=4050.02, wps=2166, ups=0.4, wpb=2479, bsz=172, num_updates=201, lr=0.025, gnorm=0.613, clip| epoch 006: 77%|▊| 27/35 [00:31<00:09, 1.19s/it, loss=11.971, ppl=4014.02, wps=2197, ups=0.4, wpb=2528, bsz=176, num_updates=202, lr=0.025, gnorm=0.604, clip| epoch 006: 80%|▊| 28/35 [00:32<00:08, 1.19s/it, loss=11.998, ppl=4089.83, wps=2185, ups=0.4, wpb=2519, bsz=172, num_updates=203, lr=0.025, gnorm=0.601, clip| epoch 006: 83%|▊| 29/35 [00:33<00:07, 1.20s/it, loss=12.006, ppl=4112.74, wps=2197, ups=0.4, wpb=2536, bsz=172, num_updates=204, lr=0.025, gnorm=0.595, clip| epoch 006: 86%|▊| 30/35 [00:34<00:06, 1.20s/it, loss=12.000, ppl=4095.56, wps=2220, ups=0.4, wpb=2566, bsz=175, num_updates=205, lr=0.025, gnorm=0.587, clip| epoch 006: 89%|▉| 31/35 [00:35<00:04, 1.19s/it, loss=11.976, ppl=4027.26, wps=2233, ups=0.4, wpb=2583, bsz=175, num_updates=206, lr=0.025, gnorm=0.582, clip| epoch 006: 91%|▉| 32/35 [00:37<00:03, 1.22s/it, loss=11.954, ppl=3968.37, wps=2255, ups=0.4, wpb=2616, bsz=177, num_updates=207, lr=0.025, gnorm=0.578, clip| epoch 006: 94%|▉| 33/35 [00:38<00:02, 1.09s/it, loss=11.952, ppl=3961.70, wps=2254, ups=0.4, wpb=2588, bsz=176, num_updates=208, lr=0.025, gnorm=0.573, clip| epoch 006: 97%|▉| 34/35 [00:39<00:01, 1.12s/it, loss=11.930, ppl=3900.86, wps=2265, ups=0.4, wpb=2603, bsz=176, num_updates=209, lr=0.025, gnorm=0.568, clip| epoch 006: 100%|█| 35/35 [00:40<00:00, 1.08s/it, loss=11.934, ppl=3912.70, wps=2234, ups=0.4, wpb=2560, bsz=174, num_updates=210, lr=0.025, gnorm=0.574, clip | epoch 006 | loss 11.934 | ppl 3912.70 | wps 2234 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 210 | lr 0.025 | gnorm 0.574 | clip 100% | oom 0 | wall 437 | train_wall 236 | epoch 006 | valid on 'valid' subset | valid_loss 12.1386 | valid_ppl 4509.05 | num_updates 210 | best 12.1386 | epoch 007: 3%| | 1/35 [00:01<00:45, 1.32s/it, loss=11.539, ppl=2975.04, wps=74, ups=0.0, wpb=3604, bsz=248, num_updates=211, lr=0.025, gnorm=0.485, clip=10| epoch 007: 6%| | 2/35 [00:02<00:41, 1.27s/it, loss=11.958, ppl=3979.69, wps=2294, ups=0.0, wpb=3110, bsz=248, num_updates=212, lr=0.025, gnorm=0.452, clip=| epoch 007: 9%| | 3/35 [00:03<00:41, 1.29s/it, loss=11.963, ppl=3992.12, wps=2362, ups=0.1, wpb=3147, bsz=253, num_updates=213, lr=0.025, gnorm=0.510, clip=| epoch 007: 11%| | 4/35 [00:04<00:39, 1.26s/it, loss=11.900, ppl=3822.72, wps=2282, ups=0.1, wpb=2993, bsz=224, num_updates=214, lr=0.025, gnorm=0.470, clip=| epoch 007: 14%|▏| 5/35 [00:06<00:36, 1.22s/it, loss=11.961, ppl=3986.99, wps=2240, ups=0.1, wpb=2868, bsz=222, num_updates=215, lr=0.025, gnorm=0.435, clip=| epoch 007: 17%|▏| 6/35 [00:07<00:33, 1.15s/it, loss=11.964, ppl=3996.08, wps=2016, ups=0.1, wpb=2541, bsz=196, num_updates=216, lr=0.025, gnorm=0.495, clip=| epoch 007: 20%|▏| 7/35 [00:08<00:32, 1.15s/it, loss=11.950, ppl=3956.74, wps=2126, ups=0.1, wpb=2621, bsz=197, num_updates=217, lr=0.025, gnorm=0.692, clip=| epoch 007: 23%|▏| 8/35 [00:09<00:31, 1.17s/it, loss=11.936, ppl=3918.94, wps=2232, ups=0.1, wpb=2720, bsz=205, num_updates=218, lr=0.025, gnorm=0.670, clip=| epoch 007: 26%|▎| 9/35 [00:10<00:30, 1.17s/it, loss=11.858, ppl=3713.28, wps=2281, ups=0.2, wpb=2764, bsz=203, num_updates=219, lr=0.025, gnorm=0.653, clip=| epoch 007: 29%|▎| 10/35 [00:11<00:29, 1.18s/it, loss=11.905, ppl=3834.46, wps=2262, ups=0.2, wpb=2738, bsz=192, num_updates=220, lr=0.025, gnorm=0.628, clip| epoch 007: 31%|▎| 11/35 [00:12<00:26, 1.12s/it, loss=11.918, ppl=3868.82, wps=2158, ups=0.2, wpb=2586, bsz=184, num_updates=221, lr=0.025, gnorm=0.630, clip| epoch 007: 34%|▎| 12/35 [00:13<00:23, 1.00s/it, loss=11.940, ppl=3929.63, wps=2110, ups=0.2, wpb=2451, bsz=170, num_updates=222, lr=0.025, gnorm=0.675, clip| epoch 007: 37%|▎| 13/35 [00:14<00:23, 1.05s/it, loss=11.891, ppl=3798.69, wps=2155, ups=0.2, wpb=2500, bsz=171, num_updates=223, lr=0.025, gnorm=0.723, clip| epoch 007: 40%|▍| 14/35 [00:15<00:23, 1.10s/it, loss=11.917, ppl=3868.08, wps=2183, ups=0.2, wpb=2538, bsz=170, num_updates=224, lr=0.025, gnorm=0.725, clip| epoch 007: 43%|▍| 15/35 [00:17<00:22, 1.13s/it, loss=11.969, ppl=4009.16, wps=2160, ups=0.2, wpb=2520, bsz=163, num_updates=225, lr=0.025, gnorm=0.708, clip| epoch 007: 46%|▍| 16/35 [00:18<00:21, 1.11s/it, loss=11.977, ppl=4032.37, wps=2108, ups=0.2, wpb=2451, bsz=161, num_updates=226, lr=0.025, gnorm=0.691, clip| epoch 007: 49%|▍| 17/35 [00:19<00:21, 1.18s/it, loss=11.954, ppl=3968.40, wps=2163, ups=0.3, wpb=2531, bsz=168, num_updates=227, lr=0.025, gnorm=0.679, clip| epoch 007: 51%|▌| 18/35 [00:20<00:19, 1.17s/it, loss=11.968, ppl=4007.23, wps=2155, ups=0.3, wpb=2519, bsz=170, num_updates=228, lr=0.025, gnorm=0.665, clip| epoch 007: 54%|▌| 19/35 [00:21<00:19, 1.19s/it, loss=11.937, ppl=3920.35, wps=2207, ups=0.3, wpb=2585, bsz=174, num_updates=229, lr=0.025, gnorm=0.671, clip| epoch 007: 57%|▌| 20/35 [00:23<00:18, 1.24s/it, loss=11.948, ppl=3950.04, wps=2229, ups=0.3, wpb=2629, bsz=181, num_updates=230, lr=0.025, gnorm=0.666, clip| epoch 007: 60%|▌| 21/35 [00:24<00:17, 1.22s/it, loss=11.956, ppl=3971.82, wps=2230, ups=0.3, wpb=2630, bsz=177, num_updates=231, lr=0.025, gnorm=0.648, clip| epoch 007: 63%|▋| 22/35 [00:25<00:15, 1.18s/it, loss=11.970, ppl=4010.88, wps=2182, ups=0.3, wpb=2566, bsz=174, num_updates=232, lr=0.025, gnorm=0.661, clip| epoch 007: 66%|▋| 23/35 [00:26<00:14, 1.22s/it, loss=11.953, ppl=3965.44, wps=2218, ups=0.3, wpb=2619, bsz=179, num_updates=233, lr=0.025, gnorm=0.643, clip| epoch 007: 69%|▋| 24/35 [00:28<00:13, 1.22s/it, loss=11.962, ppl=3989.57, wps=2224, ups=0.3, wpb=2629, bsz=177, num_updates=234, lr=0.025, gnorm=0.628, clip| epoch 007: 71%|▋| 25/35 [00:29<00:11, 1.16s/it, loss=11.966, ppl=4001.58, wps=2173, ups=0.3, wpb=2559, bsz=174, num_updates=235, lr=0.025, gnorm=0.629, clip| epoch 007: 74%|▋| 26/35 [00:30<00:10, 1.14s/it, loss=11.972, ppl=4016.55, wps=2141, ups=0.3, wpb=2517, bsz=172, num_updates=236, lr=0.025, gnorm=0.622, clip| epoch 007: 77%|▊| 27/35 [00:31<00:09, 1.18s/it, loss=11.976, ppl=4029.74, wps=2159, ups=0.3, wpb=2546, bsz=176, num_updates=237, lr=0.025, gnorm=0.627, clip| epoch 007: 80%|▊| 28/35 [00:32<00:08, 1.15s/it, loss=11.961, ppl=3986.86, wps=2175, ups=0.4, wpb=2555, bsz=175, num_updates=238, lr=0.025, gnorm=0.629, clip| epoch 007: 83%|▊| 29/35 [00:33<00:06, 1.15s/it, loss=11.963, ppl=3993.49, wps=2175, ups=0.4, wpb=2555, bsz=174, num_updates=239, lr=0.025, gnorm=0.623, clip| epoch 007: 86%|▊| 30/35 [00:34<00:05, 1.19s/it, loss=11.940, ppl=3928.09, wps=2202, ups=0.4, wpb=2590, bsz=176, num_updates=240, lr=0.025, gnorm=0.616, clip| epoch 007: 89%|▉| 31/35 [00:36<00:04, 1.15s/it, loss=11.941, ppl=3931.53, wps=2186, ups=0.4, wpb=2566, bsz=175, num_updates=241, lr=0.025, gnorm=0.616, clip| epoch 007: 91%|▉| 32/35 [00:37<00:03, 1.13s/it, loss=11.935, ppl=3914.21, wps=2176, ups=0.4, wpb=2549, bsz=173, num_updates=242, lr=0.025, gnorm=0.609, clip| epoch 007: 94%|▉| 33/35 [00:38<00:02, 1.11s/it, loss=11.930, ppl=3901.21, wps=2187, ups=0.4, wpb=2554, bsz=174, num_updates=243, lr=0.025, gnorm=0.604, clip| epoch 007: 97%|▉| 34/35 [00:39<00:01, 1.15s/it, loss=11.908, ppl=3842.58, wps=2210, ups=0.4, wpb=2585, bsz=175, num_updates=244, lr=0.025, gnorm=0.599, clip| epoch 007: 100%|█| 35/35 [00:40<00:00, 1.04s/it, loss=11.906, ppl=3837.64, wps=2209, ups=0.4, wpb=2560, bsz=174, num_updates=245, lr=0.025, gnorm=0.594, clip | epoch 007 | loss 11.906 | ppl 3837.64 | wps 2209 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 245 | lr 0.025 | gnorm 0.594 | clip 100% | oom 0 | wall 525 | train_wall 276 | epoch 007 | valid on 'valid' subset | valid_loss 12.1032 | valid_ppl 4399.71 | num_updates 245 | best 12.1032 | epoch 008: 3%| | 1/35 [00:01<00:43, 1.29s/it, loss=12.011, ppl=4128.55, wps=78, ups=0.0, wpb=3221, bsz=264, num_updates=246, lr=0.025, gnorm=0.707, clip=10| epoch 008: 6%| | 2/35 [00:02<00:39, 1.21s/it, loss=12.086, ppl=4348.03, wps=1068, ups=0.0, wpb=2144, bsz=184, num_updates=247, lr=0.025, gnorm=0.745, clip=| epoch 008: 9%| | 3/35 [00:03<00:39, 1.22s/it, loss=11.815, ppl=3602.04, wps=2143, ups=0.1, wpb=2688, bsz=208, num_updates=248, lr=0.025, gnorm=0.612, clip=| epoch 008: 11%| | 4/35 [00:04<00:37, 1.20s/it, loss=11.900, ppl=3822.81, wps=2106, ups=0.1, wpb=2592, bsz=206, num_updates=249, lr=0.025, gnorm=0.563, clip=| epoch 008: 14%|▏| 5/35 [00:05<00:36, 1.22s/it, loss=11.944, ppl=3939.81, wps=2230, ups=0.1, wpb=2730, bsz=221, num_updates=250, lr=0.025, gnorm=0.601, clip=| epoch 008: 17%|▏| 6/35 [00:07<00:35, 1.21s/it, loss=11.819, ppl=3612.81, wps=2307, ups=0.1, wpb=2794, bsz=215, num_updates=251, lr=0.025, gnorm=0.551, clip=| epoch 008: 20%|▏| 7/35 [00:08<00:34, 1.23s/it, loss=11.747, ppl=3437.91, wps=2405, ups=0.1, wpb=2913, bsz=219, num_updates=252, lr=0.025, gnorm=0.510, clip=| epoch 008: 23%|▏| 8/35 [00:09<00:33, 1.23s/it, loss=11.794, ppl=3551.10, wps=2398, ups=0.2, wpb=2908, bsz=210, num_updates=253, lr=0.025, gnorm=0.481, clip=| epoch 008: 26%|▎| 9/35 [00:10<00:27, 1.07s/it, loss=11.824, ppl=3625.54, wps=2313, ups=0.2, wpb=2691, bsz=189, num_updates=254, lr=0.025, gnorm=0.534, clip=| epoch 008: 29%|▎| 10/35 [00:11<00:27, 1.11s/it, loss=11.851, ppl=3693.36, wps=2303, ups=0.2, wpb=2686, bsz=181, num_updates=255, lr=0.025, gnorm=0.512, clip| epoch 008: 31%|▎| 11/35 [00:12<00:28, 1.18s/it, loss=11.883, ppl=3777.34, wps=2336, ups=0.2, wpb=2757, bsz=192, num_updates=256, lr=0.025, gnorm=0.521, clip| epoch 008: 34%|▎| 12/35 [00:14<00:27, 1.19s/it, loss=11.876, ppl=3758.81, wps=2382, ups=0.2, wpb=2812, bsz=198, num_updates=257, lr=0.025, gnorm=0.501, clip| epoch 008: 37%|▎| 13/35 [00:15<00:25, 1.15s/it, loss=11.888, ppl=3790.02, wps=2302, ups=0.2, wpb=2705, bsz=193, num_updates=258, lr=0.025, gnorm=0.496, clip| epoch 008: 40%|▍| 14/35 [00:16<00:23, 1.12s/it, loss=11.871, ppl=3746.67, wps=2326, ups=0.2, wpb=2713, bsz=191, num_updates=259, lr=0.025, gnorm=0.579, clip| epoch 008: 43%|▍| 15/35 [00:17<00:22, 1.10s/it, loss=11.864, ppl=3727.92, wps=2342, ups=0.3, wpb=2713, bsz=192, num_updates=260, lr=0.025, gnorm=0.578, clip| epoch 008: 46%|▍| 16/35 [00:18<00:21, 1.12s/it, loss=11.861, ppl=3718.62, wps=2365, ups=0.3, wpb=2737, bsz=193, num_updates=261, lr=0.025, gnorm=0.637, clip| epoch 008: 49%|▍| 17/35 [00:19<00:18, 1.02s/it, loss=11.860, ppl=3718.16, wps=2356, ups=0.3, wpb=2677, bsz=189, num_updates=262, lr=0.025, gnorm=0.663, clip| epoch 008: 51%|▌| 18/35 [00:20<00:18, 1.07s/it, loss=11.830, ppl=3641.45, wps=2372, ups=0.3, wpb=2700, bsz=188, num_updates=263, lr=0.025, gnorm=0.681, clip| epoch 008: 54%|▌| 19/35 [00:21<00:17, 1.07s/it, loss=11.849, ppl=3688.15, wps=2308, ups=0.3, wpb=2623, bsz=184, num_updates=264, lr=0.025, gnorm=0.698, clip| epoch 008: 57%|▌| 20/35 [00:22<00:16, 1.10s/it, loss=11.856, ppl=3706.47, wps=2301, ups=0.3, wpb=2619, bsz=181, num_updates=265, lr=0.025, gnorm=0.682, clip| epoch 008: 60%|▌| 21/35 [00:23<00:16, 1.16s/it, loss=11.845, ppl=3677.95, wps=2334, ups=0.3, wpb=2674, bsz=186, num_updates=266, lr=0.025, gnorm=0.666, clip| epoch 008: 63%|▋| 22/35 [00:25<00:15, 1.21s/it, loss=11.835, ppl=3652.97, wps=2363, ups=0.3, wpb=2726, bsz=190, num_updates=267, lr=0.025, gnorm=0.646, clip| epoch 008: 66%|▋| 23/35 [00:26<00:14, 1.19s/it, loss=11.864, ppl=3727.23, wps=2360, ups=0.3, wpb=2721, bsz=193, num_updates=268, lr=0.025, gnorm=0.636, clip| epoch 008: 69%|▋| 24/35 [00:27<00:13, 1.19s/it, loss=11.881, ppl=3772.84, wps=2349, ups=0.4, wpb=2712, bsz=189, num_updates=269, lr=0.025, gnorm=0.622, clip| epoch 008: 71%|▋| 25/35 [00:28<00:11, 1.16s/it, loss=11.888, ppl=3789.92, wps=2310, ups=0.4, wpb=2662, bsz=186, num_updates=270, lr=0.025, gnorm=0.616, clip| epoch 008: 74%|▋| 26/35 [00:29<00:09, 1.10s/it, loss=11.888, ppl=3791.19, wps=2262, ups=0.4, wpb=2595, bsz=181, num_updates=271, lr=0.025, gnorm=0.613, clip| epoch 008: 77%|▊| 27/35 [00:30<00:08, 1.08s/it, loss=11.892, ppl=3801.52, wps=2212, ups=0.4, wpb=2531, bsz=178, num_updates=272, lr=0.025, gnorm=0.613, clip| epoch 008: 80%|▊| 28/35 [00:31<00:07, 1.08s/it, loss=11.885, ppl=3783.00, wps=2200, ups=0.4, wpb=2513, bsz=176, num_updates=273, lr=0.025, gnorm=0.606, clip| epoch 008: 83%|▊| 29/35 [00:32<00:06, 1.08s/it, loss=11.887, ppl=3787.73, wps=2182, ups=0.4, wpb=2490, bsz=174, num_updates=274, lr=0.025, gnorm=0.605, clip| epoch 008: 86%|▊| 30/35 [00:34<00:05, 1.11s/it, loss=11.878, ppl=3763.19, wps=2181, ups=0.4, wpb=2491, bsz=173, num_updates=275, lr=0.025, gnorm=0.595, clip| epoch 008: 89%|▉| 31/35 [00:35<00:04, 1.15s/it, loss=11.854, ppl=3702.30, wps=2206, ups=0.4, wpb=2526, bsz=174, num_updates=276, lr=0.025, gnorm=0.590, clip| epoch 008: 91%|▉| 32/35 [00:36<00:03, 1.17s/it, loss=11.879, ppl=3766.05, wps=2195, ups=0.4, wpb=2518, bsz=171, num_updates=277, lr=0.025, gnorm=0.582, clip| epoch 008: 94%|▉| 33/35 [00:37<00:02, 1.16s/it, loss=11.891, ppl=3796.69, wps=2192, ups=0.4, wpb=2514, bsz=172, num_updates=278, lr=0.025, gnorm=0.581, clip| epoch 008: 97%|▉| 34/35 [00:39<00:01, 1.21s/it, loss=11.874, ppl=3754.00, wps=2210, ups=0.4, wpb=2546, bsz=174, num_updates=279, lr=0.025, gnorm=0.577, clip| epoch 008: 100%|█| 35/35 [00:40<00:00, 1.21s/it, loss=11.883, ppl=3775.75, wps=2219, ups=0.4, wpb=2560, bsz=174, num_updates=280, lr=0.025, gnorm=0.568, clip | epoch 008 | loss 11.883 | ppl 3775.75 | wps 2219 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 280 | lr 0.025 | gnorm 0.568 | clip 100% | oom 0 | wall 605 | train_wall 315 | epoch 008 | valid on 'valid' subset | valid_loss 12.1005 | valid_ppl 4391.53 | num_updates 280 | best 12.1005 | epoch 009: 3%| | 1/35 [00:01<00:40, 1.18s/it, loss=11.323, ppl=2562.22, wps=73, ups=0.0, wpb=3090, bsz=176, num_updates=281, lr=0.025, gnorm=0.371, clip=10| epoch 009: 6%| | 2/35 [00:02<00:36, 1.12s/it, loss=11.450, ppl=2797.75, wps=919, ups=0.0, wpb=1997, bsz=120, num_updates=282, lr=0.025, gnorm=0.434, clip=1| epoch 009: 9%| | 3/35 [00:03<00:35, 1.12s/it, loss=11.727, ppl=3390.28, wps=1520, ups=0.1, wpb=2100, bsz=147, num_updates=283, lr=0.025, gnorm=0.434, clip=| epoch 009: 11%| | 4/35 [00:04<00:35, 1.15s/it, loss=11.766, ppl=3483.86, wps=1998, ups=0.1, wpb=2430, bsz=176, num_updates=284, lr=0.025, gnorm=0.442, clip=| epoch 009: 14%|▏| 5/35 [00:05<00:33, 1.12s/it, loss=11.766, ppl=3482.60, wps=2135, ups=0.1, wpb=2487, bsz=182, num_updates=285, lr=0.025, gnorm=0.441, clip=| epoch 009: 17%|▏| 6/35 [00:06<00:29, 1.02s/it, loss=11.767, ppl=3485.32, wps=2142, ups=0.1, wpb=2358, bsz=173, num_updates=286, lr=0.025, gnorm=0.432, clip=| epoch 009: 20%|▏| 7/35 [00:07<00:28, 1.03s/it, loss=11.824, ppl=3625.63, wps=1975, ups=0.1, wpb=2198, bsz=165, num_updates=287, lr=0.025, gnorm=0.445, clip=| epoch 009: 23%|▏| 8/35 [00:08<00:30, 1.12s/it, loss=11.796, ppl=3556.56, wps=2131, ups=0.2, wpb=2400, bsz=179, num_updates=288, lr=0.025, gnorm=0.439, clip=| epoch 009: 26%|▎| 9/35 [00:09<00:28, 1.11s/it, loss=11.777, ppl=3509.83, wps=2097, ups=0.2, wpb=2360, bsz=172, num_updates=289, lr=0.025, gnorm=0.427, clip=| epoch 009: 29%|▎| 10/35 [00:11<00:29, 1.17s/it, loss=11.814, ppl=3601.49, wps=2157, ups=0.2, wpb=2452, bsz=183, num_updates=290, lr=0.025, gnorm=0.448, clip| epoch 009: 31%|▎| 11/35 [00:12<00:27, 1.15s/it, loss=11.852, ppl=3696.91, wps=2152, ups=0.2, wpb=2444, bsz=186, num_updates=291, lr=0.025, gnorm=0.444, clip| epoch 009: 34%|▎| 12/35 [00:13<00:27, 1.21s/it, loss=11.804, ppl=3576.79, wps=2211, ups=0.2, wpb=2541, bsz=191, num_updates=292, lr=0.025, gnorm=0.431, clip| epoch 009: 37%|▎| 13/35 [00:14<00:27, 1.23s/it, loss=11.816, ppl=3606.61, wps=2239, ups=0.2, wpb=2593, bsz=197, num_updates=293, lr=0.025, gnorm=0.434, clip| epoch 009: 40%|▍| 14/35 [00:15<00:24, 1.19s/it, loss=11.829, ppl=3637.59, wps=2174, ups=0.2, wpb=2513, bsz=191, num_updates=294, lr=0.025, gnorm=0.434, clip| epoch 009: 43%|▍| 15/35 [00:17<00:23, 1.15s/it, loss=11.840, ppl=3666.15, wps=2117, ups=0.3, wpb=2440, bsz=187, num_updates=295, lr=0.025, gnorm=0.428, clip| epoch 009: 46%|▍| 16/35 [00:18<00:21, 1.15s/it, loss=11.836, ppl=3656.44, wps=2156, ups=0.3, wpb=2482, bsz=188, num_updates=296, lr=0.025, gnorm=0.478, clip| epoch 009: 49%|▍| 17/35 [00:19<00:20, 1.17s/it, loss=11.886, ppl=3783.58, wps=2138, ups=0.3, wpb=2469, bsz=180, num_updates=297, lr=0.025, gnorm=0.486, clip| epoch 009: 51%|▌| 18/35 [00:20<00:20, 1.19s/it, loss=11.853, ppl=3699.10, wps=2186, ups=0.3, wpb=2531, bsz=182, num_updates=298, lr=0.025, gnorm=0.541, clip| epoch 009: 54%|▌| 19/35 [00:21<00:18, 1.19s/it, loss=11.861, ppl=3719.75, wps=2185, ups=0.3, wpb=2532, bsz=179, num_updates=299, lr=0.025, gnorm=0.555, clip| epoch 009: 57%|▌| 20/35 [00:23<00:18, 1.21s/it, loss=11.830, ppl=3640.22, wps=2224, ups=0.3, wpb=2587, bsz=183, num_updates=300, lr=0.025, gnorm=0.569, clip| epoch 009: 60%|▌| 21/35 [00:24<00:16, 1.15s/it, loss=11.836, ppl=3657.08, wps=2173, ups=0.3, wpb=2514, bsz=179, num_updates=301, lr=0.025, gnorm=0.565, clip| epoch 009: 63%|▋| 22/35 [00:25<00:15, 1.17s/it, loss=11.854, ppl=3701.61, wps=2182, ups=0.3, wpb=2531, bsz=177, num_updates=302, lr=0.025, gnorm=0.587, clip| epoch 009: 66%|▋| 23/35 [00:26<00:14, 1.21s/it, loss=11.843, ppl=3673.60, wps=2219, ups=0.3, wpb=2584, bsz=182, num_updates=303, lr=0.025, gnorm=0.590, clip| epoch 009: 69%|▋| 24/35 [00:27<00:13, 1.20s/it, loss=11.863, ppl=3724.16, wps=2214, ups=0.3, wpb=2581, bsz=178, num_updates=304, lr=0.025, gnorm=0.596, clip| epoch 009: 71%|▋| 25/35 [00:28<00:10, 1.06s/it, loss=11.873, ppl=3750.29, wps=2191, ups=0.4, wpb=2516, bsz=172, num_updates=305, lr=0.025, gnorm=0.612, clip| epoch 009: 74%|▋| 26/35 [00:29<00:09, 1.07s/it, loss=11.878, ppl=3763.00, wps=2171, ups=0.4, wpb=2490, bsz=170, num_updates=306, lr=0.025, gnorm=0.649, clip| epoch 009: 77%|▊| 27/35 [00:30<00:08, 1.10s/it, loss=11.886, ppl=3785.73, wps=2173, ups=0.4, wpb=2495, bsz=168, num_updates=307, lr=0.025, gnorm=0.649, clip| epoch 009: 80%|▊| 28/35 [00:31<00:07, 1.09s/it, loss=11.875, ppl=3757.26, wps=2189, ups=0.4, wpb=2506, bsz=168, num_updates=308, lr=0.025, gnorm=0.675, clip| epoch 009: 83%|▊| 29/35 [00:33<00:06, 1.12s/it, loss=11.849, ppl=3689.93, wps=2206, ups=0.4, wpb=2527, bsz=169, num_updates=309, lr=0.025, gnorm=0.673, clip| epoch 009: 86%|▊| 30/35 [00:34<00:05, 1.15s/it, loss=11.859, ppl=3715.66, wps=2216, ups=0.4, wpb=2544, bsz=168, num_updates=310, lr=0.025, gnorm=0.665, clip| epoch 009: 89%|▉| 31/35 [00:35<00:04, 1.16s/it, loss=11.851, ppl=3694.21, wps=2214, ups=0.4, wpb=2543, bsz=167, num_updates=311, lr=0.025, gnorm=0.656, clip| epoch 009: 91%|▉| 32/35 [00:36<00:03, 1.12s/it, loss=11.855, ppl=3703.22, wps=2174, ups=0.4, wpb=2491, bsz=165, num_updates=312, lr=0.025, gnorm=0.655, clip| epoch 009: 94%|▉| 33/35 [00:37<00:02, 1.12s/it, loss=11.876, ppl=3757.44, wps=2178, ups=0.4, wpb=2495, bsz=167, num_updates=313, lr=0.025, gnorm=0.648, clip| epoch 009: 97%|▉| 34/35 [00:38<00:01, 1.16s/it, loss=11.857, ppl=3710.63, wps=2206, ups=0.4, wpb=2533, bsz=170, num_updates=314, lr=0.025, gnorm=0.635, clip| epoch 009: 100%|█| 35/35 [00:40<00:00, 1.22s/it, loss=11.869, ppl=3740.42, wps=2218, ups=0.4, wpb=2560, bsz=174, num_updates=315, lr=0.025, gnorm=0.637, clip | epoch 009 | loss 11.869 | ppl 3740.42 | wps 2218 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 315 | lr 0.025 | gnorm 0.637 | clip 100% | oom 0 | wall 686 | train_wall 354 | epoch 009 | valid on 'valid' subset | valid_loss 12.0871 | valid_ppl 4350.90 | num_updates 315 | best 12.0871 | epoch 010: 3%| | 1/35 [00:01<00:35, 1.05s/it, loss=12.137, ppl=4504.75, wps=34, ups=0.0, wpb=1421, bsz=128, num_updates=316, lr=0.025, gnorm=0.531, clip=10| epoch 010: 6%| | 2/35 [00:02<00:35, 1.08s/it, loss=11.836, ppl=3654.81, wps=2683, ups=0.0, wpb=2261, bsz=164, num_updates=317, lr=0.025, gnorm=0.394, clip=| epoch 010: 9%| | 3/35 [00:03<00:37, 1.16s/it, loss=11.763, ppl=3476.56, wps=2777, ups=0.1, wpb=2780, bsz=203, num_updates=318, lr=0.025, gnorm=0.410, clip=| epoch 010: 11%| | 4/35 [00:04<00:36, 1.17s/it, loss=11.831, ppl=3644.19, wps=2597, ups=0.1, wpb=2744, bsz=180, num_updates=319, lr=0.025, gnorm=0.389, clip=| epoch 010: 14%|▏| 5/35 [00:06<00:36, 1.22s/it, loss=11.740, ppl=3420.38, wps=2626, ups=0.1, wpb=2916, bsz=194, num_updates=320, lr=0.025, gnorm=0.392, clip=| epoch 010: 17%|▏| 6/35 [00:06<00:31, 1.09s/it, loss=11.742, ppl=3424.71, wps=2567, ups=0.1, wpb=2715, bsz=183, num_updates=321, lr=0.025, gnorm=0.379, clip=| epoch 010: 20%|▏| 7/35 [00:08<00:32, 1.17s/it, loss=11.820, ppl=3615.45, wps=2568, ups=0.1, wpb=2823, bsz=200, num_updates=322, lr=0.025, gnorm=0.425, clip=| epoch 010: 23%|▏| 8/35 [00:09<00:30, 1.14s/it, loss=11.841, ppl=3667.94, wps=2406, ups=0.2, wpb=2654, bsz=189, num_updates=323, lr=0.025, gnorm=0.431, clip=| epoch 010: 26%|▎| 9/35 [00:10<00:29, 1.12s/it, loss=11.845, ppl=3679.67, wps=2323, ups=0.2, wpb=2561, bsz=182, num_updates=324, lr=0.025, gnorm=0.418, clip=| epoch 010: 29%|▎| 10/35 [00:11<00:29, 1.16s/it, loss=11.790, ppl=3542.36, wps=2403, ups=0.2, wpb=2683, bsz=190, num_updates=325, lr=0.025, gnorm=0.408, clip| epoch 010: 31%|▎| 11/35 [00:12<00:24, 1.03s/it, loss=11.815, ppl=3602.37, wps=2335, ups=0.2, wpb=2526, bsz=174, num_updates=326, lr=0.025, gnorm=0.433, clip| epoch 010: 34%|▎| 12/35 [00:13<00:24, 1.05s/it, loss=11.798, ppl=3561.81, wps=2294, ups=0.2, wpb=2485, bsz=170, num_updates=327, lr=0.025, gnorm=0.425, clip| epoch 010: 37%|▎| 13/35 [00:14<00:24, 1.11s/it, loss=11.747, ppl=3437.29, wps=2348, ups=0.2, wpb=2570, bsz=173, num_updates=328, lr=0.025, gnorm=0.415, clip| epoch 010: 40%|▍| 14/35 [00:15<00:22, 1.09s/it, loss=11.746, ppl=3433.83, wps=2364, ups=0.2, wpb=2580, bsz=176, num_updates=329, lr=0.025, gnorm=0.406, clip| epoch 010: 43%|▍| 15/35 [00:17<00:23, 1.16s/it, loss=11.738, ppl=3414.98, wps=2406, ups=0.3, wpb=2660, bsz=183, num_updates=330, lr=0.025, gnorm=0.400, clip| epoch 010: 46%|▍| 16/35 [00:18<00:22, 1.17s/it, loss=11.770, ppl=3492.39, wps=2386, ups=0.3, wpb=2650, bsz=177, num_updates=331, lr=0.025, gnorm=0.396, clip| epoch 010: 49%|▍| 17/35 [00:19<00:21, 1.20s/it, loss=11.739, ppl=3417.31, wps=2419, ups=0.3, wpb=2708, bsz=181, num_updates=332, lr=0.025, gnorm=0.386, clip| epoch 010: 51%|▌| 18/35 [00:20<00:19, 1.18s/it, loss=11.759, ppl=3466.31, wps=2397, ups=0.3, wpb=2686, bsz=182, num_updates=333, lr=0.025, gnorm=0.380, clip| epoch 010: 54%|▌| 19/35 [00:21<00:17, 1.12s/it, loss=11.769, ppl=3489.12, wps=2332, ups=0.3, wpb=2601, bsz=178, num_updates=334, lr=0.025, gnorm=0.394, clip| epoch 010: 57%|▌| 20/35 [00:22<00:17, 1.15s/it, loss=11.786, ppl=3531.78, wps=2342, ups=0.3, wpb=2622, bsz=177, num_updates=335, lr=0.025, gnorm=0.388, clip| epoch 010: 60%|▌| 21/35 [00:24<00:16, 1.15s/it, loss=11.793, ppl=3549.64, wps=2334, ups=0.3, wpb=2618, bsz=175, num_updates=336, lr=0.025, gnorm=0.381, clip| epoch 010: 63%|▋| 22/35 [00:25<00:14, 1.15s/it, loss=11.826, ppl=3630.36, wps=2332, ups=0.3, wpb=2618, bsz=178, num_updates=337, lr=0.025, gnorm=0.378, clip| epoch 010: 66%|▋| 23/35 [00:26<00:13, 1.16s/it, loss=11.859, ppl=3713.72, wps=2311, ups=0.3, wpb=2603, bsz=173, num_updates=338, lr=0.025, gnorm=0.376, clip| epoch 010: 69%|▋| 24/35 [00:27<00:12, 1.18s/it, loss=11.869, ppl=3740.07, wps=2314, ups=0.3, wpb=2614, bsz=172, num_updates=339, lr=0.025, gnorm=0.374, clip| epoch 010: 71%|▋| 25/35 [00:28<00:11, 1.12s/it, loss=11.868, ppl=3738.78, wps=2264, ups=0.4, wpb=2546, bsz=168, num_updates=340, lr=0.025, gnorm=0.384, clip| epoch 010: 74%|▋| 26/35 [00:29<00:10, 1.17s/it, loss=11.870, ppl=3742.60, wps=2274, ups=0.4, wpb=2572, bsz=172, num_updates=341, lr=0.025, gnorm=0.383, clip| epoch 010: 77%|▊| 27/35 [00:31<00:09, 1.18s/it, loss=11.866, ppl=3732.63, wps=2297, ups=0.4, wpb=2603, bsz=175, num_updates=342, lr=0.025, gnorm=0.387, clip| epoch 010: 80%|▊| 28/35 [00:32<00:08, 1.16s/it, loss=11.878, ppl=3763.85, wps=2290, ups=0.4, wpb=2595, bsz=176, num_updates=343, lr=0.025, gnorm=0.393, clip| epoch 010: 83%|▊| 29/35 [00:33<00:06, 1.14s/it, loss=11.888, ppl=3789.58, wps=2252, ups=0.4, wpb=2548, bsz=174, num_updates=344, lr=0.025, gnorm=0.411, clip| epoch 010: 86%|▊| 30/35 [00:34<00:05, 1.18s/it, loss=11.892, ppl=3799.39, wps=2264, ups=0.4, wpb=2572, bsz=178, num_updates=345, lr=0.025, gnorm=0.410, clip| epoch 010: 89%|▉| 31/35 [00:35<00:04, 1.18s/it, loss=11.872, ppl=3748.46, wps=2276, ups=0.4, wpb=2589, bsz=178, num_updates=346, lr=0.025, gnorm=0.436, clip| epoch 010: 91%|▉| 32/35 [00:36<00:03, 1.14s/it, loss=11.874, ppl=3754.20, wps=2234, ups=0.4, wpb=2536, bsz=175, num_updates=347, lr=0.025, gnorm=0.439, clip| epoch 010: 94%|▉| 33/35 [00:37<00:02, 1.15s/it, loss=11.867, ppl=3734.28, wps=2231, ups=0.4, wpb=2535, bsz=174, num_updates=348, lr=0.025, gnorm=0.449, clip| epoch 010: 97%|▉| 34/35 [00:38<00:01, 1.13s/it, loss=11.860, ppl=3716.57, wps=2242, ups=0.4, wpb=2543, bsz=174, num_updates=349, lr=0.025, gnorm=0.483, clip| epoch 010: 100%|█| 35/35 [00:40<00:00, 1.15s/it, loss=11.839, ppl=3664.13, wps=2253, ups=0.4, wpb=2560, bsz=174, num_updates=350, lr=0.025, gnorm=0.490, clip | epoch 010 | loss 11.839 | ppl 3664.13 | wps 2253 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 350 | lr 0.025 | gnorm 0.490 | clip 100% | oom 0 | wall 768 | train_wall 394 | epoch 010 | valid on 'valid' subset | valid_loss 12.114 | valid_ppl 4432.79 | num_updates 350 | best 12.0871 | epoch 011: 3%| | 1/35 [00:01<00:40, 1.19s/it, loss=11.794, ppl=3551.57, wps=103, ups=0.0, wpb=3418, bsz=264, num_updates=351, lr=0.0025, gnorm=0.466, clip=| epoch 011: 6%| | 2/35 [00:02<00:40, 1.21s/it, loss=11.632, ppl=3173.66, wps=2994, ups=0.1, wpb=3596, bsz=260, num_updates=352, lr=0.0025, gnorm=0.611, clip| epoch 011: 9%| | 3/35 [00:03<00:39, 1.25s/it, loss=11.634, ppl=3178.33, wps=2925, ups=0.1, wpb=3671, bsz=267, num_updates=353, lr=0.0025, gnorm=0.532, clip| epoch 011: 11%| | 4/35 [00:05<00:39, 1.26s/it, loss=11.690, ppl=3304.48, wps=2780, ups=0.1, wpb=3558, bsz=266, num_updates=354, lr=0.0025, gnorm=0.474, clip| epoch 011: 14%|▏| 5/35 [00:06<00:35, 1.18s/it, loss=11.697, ppl=3319.98, wps=2406, ups=0.1, wpb=3027, bsz=226, num_updates=355, lr=0.0025, gnorm=0.493, clip| epoch 011: 17%|▏| 6/35 [00:07<00:34, 1.18s/it, loss=11.617, ppl=3141.93, wps=2447, ups=0.2, wpb=3041, bsz=219, num_updates=356, lr=0.0025, gnorm=0.523, clip| epoch 011: 20%|▏| 7/35 [00:08<00:33, 1.18s/it, loss=11.696, ppl=3318.03, wps=2396, ups=0.2, wpb=2965, bsz=201, num_updates=357, lr=0.0025, gnorm=0.498, clip| epoch 011: 23%|▏| 8/35 [00:09<00:31, 1.16s/it, loss=11.749, ppl=3441.34, wps=2356, ups=0.2, wpb=2890, bsz=203, num_updates=358, lr=0.0025, gnorm=0.477, clip| epoch 011: 26%|▎| 9/35 [00:10<00:30, 1.17s/it, loss=11.701, ppl=3328.25, wps=2389, ups=0.2, wpb=2913, bsz=200, num_updates=359, lr=0.0025, gnorm=0.524, clip| epoch 011: 29%|▎| 10/35 [00:11<00:27, 1.12s/it, loss=11.718, ppl=3369.07, wps=2263, ups=0.2, wpb=2728, bsz=190, num_updates=360, lr=0.0025, gnorm=0.517, cli| epoch 011: 31%|▎| 11/35 [00:12<00:27, 1.14s/it, loss=11.793, ppl=3547.73, wps=2225, ups=0.2, wpb=2686, bsz=179, num_updates=361, lr=0.0025, gnorm=0.502, cli| epoch 011: 34%|▎| 12/35 [00:14<00:27, 1.20s/it, loss=11.820, ppl=3615.69, wps=2262, ups=0.3, wpb=2752, bsz=189, num_updates=362, lr=0.0025, gnorm=0.491, cli| epoch 011: 37%|▎| 13/35 [00:15<00:26, 1.22s/it, loss=11.777, ppl=3510.45, wps=2315, ups=0.3, wpb=2819, bsz=194, num_updates=363, lr=0.0025, gnorm=0.496, cli| epoch 011: 40%|▍| 14/35 [00:16<00:24, 1.18s/it, loss=11.767, ppl=3484.75, wps=2282, ups=0.3, wpb=2763, bsz=189, num_updates=364, lr=0.0025, gnorm=0.499, cli| epoch 011: 43%|▍| 15/35 [00:17<00:23, 1.15s/it, loss=11.778, ppl=3512.26, wps=2243, ups=0.3, wpb=2701, bsz=185, num_updates=365, lr=0.0025, gnorm=0.557, cli| epoch 011: 46%|▍| 16/35 [00:18<00:21, 1.14s/it, loss=11.788, ppl=3536.42, wps=2187, ups=0.3, wpb=2624, bsz=180, num_updates=366, lr=0.0025, gnorm=0.548, cli| epoch 011: 49%|▍| 17/35 [00:19<00:20, 1.14s/it, loss=11.829, ppl=3637.70, wps=2193, ups=0.3, wpb=2623, bsz=184, num_updates=367, lr=0.0025, gnorm=0.528, cli| epoch 011: 51%|▌| 18/35 [00:21<00:20, 1.19s/it, loss=11.814, ppl=3601.38, wps=2238, ups=0.3, wpb=2687, bsz=189, num_updates=368, lr=0.0025, gnorm=0.519, cli| epoch 011: 54%|▌| 19/35 [00:21<00:16, 1.05s/it, loss=11.827, ppl=3633.47, wps=2207, ups=0.4, wpb=2596, bsz=180, num_updates=369, lr=0.0025, gnorm=0.531, cli| epoch 011: 57%|▌| 20/35 [00:23<00:16, 1.09s/it, loss=11.837, ppl=3658.57, wps=2208, ups=0.4, wpb=2598, bsz=177, num_updates=370, lr=0.0025, gnorm=0.521, cli| epoch 011: 60%|▌| 21/35 [00:24<00:16, 1.15s/it, loss=11.845, ppl=3678.14, wps=2227, ups=0.4, wpb=2631, bsz=182, num_updates=371, lr=0.0025, gnorm=0.517, cli| epoch 011: 63%|▋| 22/35 [00:25<00:14, 1.12s/it, loss=11.841, ppl=3669.21, wps=2242, ups=0.4, wpb=2635, bsz=183, num_updates=372, lr=0.0025, gnorm=0.517, cli| epoch 011: 66%|▋| 23/35 [00:26<00:13, 1.16s/it, loss=11.814, ppl=3599.40, wps=2273, ups=0.4, wpb=2676, bsz=184, num_updates=373, lr=0.0025, gnorm=0.531, cli| epoch 011: 69%|▋| 24/35 [00:27<00:12, 1.16s/it, loss=11.819, ppl=3611.83, wps=2269, ups=0.4, wpb=2670, bsz=182, num_updates=374, lr=0.0025, gnorm=0.522, cli| epoch 011: 71%|▋| 25/35 [00:29<00:12, 1.21s/it, loss=11.797, ppl=3558.75, wps=2290, ups=0.4, wpb=2708, bsz=185, num_updates=375, lr=0.0025, gnorm=0.512, cli| epoch 011: 74%|▋| 26/35 [00:30<00:10, 1.17s/it, loss=11.788, ppl=3535.26, wps=2303, ups=0.4, wpb=2712, bsz=184, num_updates=376, lr=0.0025, gnorm=0.528, cli| epoch 011: 77%|▊| 27/35 [00:31<00:09, 1.14s/it, loss=11.798, ppl=3561.62, wps=2262, ups=0.4, wpb=2657, bsz=181, num_updates=377, lr=0.0025, gnorm=0.529, cli| epoch 011: 80%|▊| 28/35 [00:32<00:08, 1.16s/it, loss=11.809, ppl=3588.82, wps=2272, ups=0.4, wpb=2670, bsz=181, num_updates=378, lr=0.0025, gnorm=0.519, cli| epoch 011: 83%|▊| 29/35 [00:33<00:06, 1.05s/it, loss=11.809, ppl=3588.64, wps=2269, ups=0.4, wpb=2637, bsz=179, num_updates=379, lr=0.0025, gnorm=0.525, cli| epoch 011: 86%|▊| 30/35 [00:34<00:05, 1.07s/it, loss=11.819, ppl=3612.79, wps=2262, ups=0.5, wpb=2626, bsz=180, num_updates=380, lr=0.0025, gnorm=0.517, cli| epoch 011: 89%|▉| 31/35 [00:35<00:04, 1.06s/it, loss=11.822, ppl=3621.07, wps=2219, ups=0.5, wpb=2570, bsz=177, num_updates=381, lr=0.0025, gnorm=0.518, cli| epoch 011: 91%|▉| 32/35 [00:36<00:03, 1.11s/it, loss=11.830, ppl=3640.82, wps=2224, ups=0.5, wpb=2579, bsz=176, num_updates=382, lr=0.0025, gnorm=0.511, cli| epoch 011: 94%|▉| 33/35 [00:37<00:02, 1.09s/it, loss=11.834, ppl=3651.95, wps=2198, ups=0.5, wpb=2544, bsz=174, num_updates=383, lr=0.0025, gnorm=0.506, cli| epoch 011: 97%|▉| 34/35 [00:39<00:01, 1.12s/it, loss=11.827, ppl=3632.37, wps=2196, ups=0.5, wpb=2544, bsz=173, num_updates=384, lr=0.0025, gnorm=0.500, cli| epoch 011: 100%|█| 35/35 [00:40<00:00, 1.13s/it, loss=11.822, ppl=3621.69, wps=2210, ups=0.5, wpb=2560, bsz=174, num_updates=385, lr=0.0025, gnorm=0.503, cli | epoch 011 | loss 11.822 | ppl 3621.69 | wps 2210 | ups 0.5 | wpb 2560 | bsz 174 | num_updates 385 | lr 0.0025 | gnorm 0.503 | clip 100% | oom 0 | wall 840 | train_wall 433 | epoch 011 | valid on 'valid' subset | valid_loss 12.0814 | valid_ppl 4333.73 | num_updates 385 | best 12.0814 | epoch 012: 3%| | 1/35 [00:01<00:36, 1.08s/it, loss=11.936, ppl=3918.26, wps=43, ups=0.0, wpb=1821, bsz=128, num_updates=386, lr=0.0025, gnorm=0.762, clip=1| epoch 012: 6%| | 2/35 [00:02<00:36, 1.12s/it, loss=11.828, ppl=3636.82, wps=2830, ups=0.0, wpb=2620, bsz=196, num_updates=387, lr=0.0025, gnorm=0.516, clip| epoch 012: 9%| | 3/35 [00:03<00:31, 1.00it/s, loss=11.925, ppl=3887.69, wps=2272, ups=0.1, wpb=2066, bsz=137, num_updates=388, lr=0.0025, gnorm=0.560, clip| epoch 012: 11%| | 4/35 [00:04<00:33, 1.08s/it, loss=11.716, ppl=3363.04, wps=2500, ups=0.1, wpb=2457, bsz=165, num_updates=389, lr=0.0025, gnorm=0.498, clip| epoch 012: 14%|▏| 5/35 [00:05<00:33, 1.11s/it, loss=11.686, ppl=3294.60, wps=2400, ups=0.1, wpb=2472, bsz=159, num_updates=390, lr=0.0025, gnorm=0.454, clip| epoch 012: 17%|▏| 6/35 [00:06<00:31, 1.08s/it, loss=11.726, ppl=3386.40, wps=2152, ups=0.1, wpb=2238, bsz=150, num_updates=391, lr=0.0025, gnorm=0.474, clip| epoch 012: 20%|▏| 7/35 [00:07<00:31, 1.13s/it, loss=11.665, ppl=3246.24, wps=2316, ups=0.1, wpb=2457, bsz=165, num_updates=392, lr=0.0025, gnorm=0.449, clip| epoch 012: 23%|▏| 8/35 [00:08<00:30, 1.14s/it, loss=11.741, ppl=3422.87, wps=2288, ups=0.2, wpb=2464, bsz=156, num_updates=393, lr=0.0025, gnorm=0.434, clip| epoch 012: 26%|▎| 9/35 [00:10<00:29, 1.14s/it, loss=11.791, ppl=3544.49, wps=2265, ups=0.2, wpb=2453, bsz=163, num_updates=394, lr=0.0025, gnorm=0.419, clip| epoch 012: 29%|▎| 10/35 [00:10<00:27, 1.09s/it, loss=11.792, ppl=3545.74, wps=2134, ups=0.2, wpb=2298, bsz=153, num_updates=395, lr=0.0025, gnorm=0.423, cli| epoch 012: 31%|▎| 11/35 [00:12<00:26, 1.12s/it, loss=11.816, ppl=3604.82, wps=2143, ups=0.2, wpb=2329, bsz=149, num_updates=396, lr=0.0025, gnorm=0.415, cli| epoch 012: 34%|▎| 12/35 [00:13<00:25, 1.09s/it, loss=11.827, ppl=3632.07, wps=2033, ups=0.2, wpb=2208, bsz=144, num_updates=397, lr=0.0025, gnorm=0.429, cli| epoch 012: 37%|▎| 13/35 [00:14<00:25, 1.17s/it, loss=11.779, ppl=3514.24, wps=2098, ups=0.2, wpb=2316, bsz=152, num_updates=398, lr=0.0025, gnorm=0.421, cli| epoch 012: 40%|▍| 14/35 [00:15<00:24, 1.15s/it, loss=11.792, ppl=3547.08, wps=2041, ups=0.2, wpb=2255, bsz=149, num_updates=399, lr=0.0025, gnorm=0.417, cli| epoch 012: 43%|▍| 15/35 [00:16<00:23, 1.16s/it, loss=11.816, ppl=3606.60, wps=2077, ups=0.3, wpb=2307, bsz=150, num_updates=400, lr=0.0025, gnorm=0.408, cli| epoch 012: 46%|▍| 16/35 [00:17<00:21, 1.15s/it, loss=11.864, ppl=3726.34, wps=2093, ups=0.3, wpb=2326, bsz=156, num_updates=401, lr=0.0025, gnorm=0.405, cli| epoch 012: 49%|▍| 17/35 [00:19<00:20, 1.16s/it, loss=11.818, ppl=3609.77, wps=2126, ups=0.3, wpb=2371, bsz=157, num_updates=402, lr=0.0025, gnorm=0.400, cli| epoch 012: 51%|▌| 18/35 [00:20<00:19, 1.16s/it, loss=11.807, ppl=3583.09, wps=2159, ups=0.3, wpb=2412, bsz=160, num_updates=403, lr=0.0025, gnorm=0.399, cli| epoch 012: 54%|▌| 19/35 [00:21<00:19, 1.22s/it, loss=11.830, ppl=3640.50, wps=2185, ups=0.3, wpb=2467, bsz=167, num_updates=404, lr=0.0025, gnorm=0.409, cli| epoch 012: 57%|▌| 20/35 [00:22<00:18, 1.22s/it, loss=11.870, ppl=3743.01, wps=2168, ups=0.3, wpb=2457, bsz=162, num_updates=405, lr=0.0025, gnorm=0.408, cli| epoch 012: 60%|▌| 21/35 [00:24<00:17, 1.24s/it, loss=11.880, ppl=3769.83, wps=2190, ups=0.3, wpb=2497, bsz=168, num_updates=406, lr=0.0025, gnorm=0.418, cli| epoch 012: 63%|▋| 22/35 [00:25<00:15, 1.18s/it, loss=11.873, ppl=3749.84, wps=2207, ups=0.3, wpb=2507, bsz=170, num_updates=407, lr=0.0025, gnorm=0.413, cli| epoch 012: 66%|▋| 23/35 [00:26<00:14, 1.23s/it, loss=11.858, ppl=3710.80, wps=2242, ups=0.3, wpb=2564, bsz=174, num_updates=408, lr=0.0025, gnorm=0.411, cli| epoch 012: 69%|▋| 24/35 [00:27<00:13, 1.21s/it, loss=11.860, ppl=3718.43, wps=2240, ups=0.3, wpb=2563, bsz=172, num_updates=409, lr=0.0025, gnorm=0.408, cli| epoch 012: 71%|▋| 25/35 [00:28<00:12, 1.21s/it, loss=11.870, ppl=3743.15, wps=2245, ups=0.4, wpb=2575, bsz=171, num_updates=410, lr=0.0025, gnorm=0.402, cli| epoch 012: 74%|▋| 26/35 [00:29<00:09, 1.08s/it, loss=11.866, ppl=3733.51, wps=2243, ups=0.4, wpb=2542, bsz=170, num_updates=411, lr=0.0025, gnorm=0.401, cli| epoch 012: 77%|▊| 27/35 [00:30<00:08, 1.08s/it, loss=11.857, ppl=3708.85, wps=2229, ups=0.4, wpb=2523, bsz=168, num_updates=412, lr=0.0025, gnorm=0.399, cli| epoch 012: 80%|▊| 28/35 [00:31<00:07, 1.08s/it, loss=11.867, ppl=3735.00, wps=2192, ups=0.4, wpb=2477, bsz=166, num_updates=413, lr=0.0025, gnorm=0.404, cli| epoch 012: 83%|▊| 29/35 [00:33<00:06, 1.12s/it, loss=11.838, ppl=3661.88, wps=2207, ups=0.4, wpb=2499, bsz=166, num_updates=414, lr=0.0025, gnorm=0.399, cli| epoch 012: 86%|▊| 30/35 [00:34<00:05, 1.16s/it, loss=11.813, ppl=3596.88, wps=2232, ups=0.4, wpb=2535, bsz=168, num_updates=415, lr=0.0025, gnorm=0.397, cli| epoch 012: 89%|▉| 31/35 [00:35<00:04, 1.15s/it, loss=11.822, ppl=3621.39, wps=2225, ups=0.4, wpb=2528, bsz=169, num_updates=416, lr=0.0025, gnorm=0.393, cli| epoch 012: 91%|▉| 32/35 [00:36<00:03, 1.12s/it, loss=11.827, ppl=3633.51, wps=2198, ups=0.4, wpb=2493, bsz=168, num_updates=417, lr=0.0025, gnorm=0.396, cli| epoch 012: 94%|▉| 33/35 [00:37<00:02, 1.11s/it, loss=11.816, ppl=3605.69, wps=2211, ups=0.4, wpb=2503, bsz=168, num_updates=418, lr=0.0025, gnorm=0.397, cli| epoch 012: 97%|▉| 34/35 [00:38<00:01, 1.17s/it, loss=11.808, ppl=3584.89, wps=2235, ups=0.4, wpb=2540, bsz=171, num_updates=419, lr=0.0025, gnorm=0.394, cli| epoch 012: 100%|█| 35/35 [00:40<00:00, 1.21s/it, loss=11.813, ppl=3598.49, wps=2243, ups=0.4, wpb=2560, bsz=174, num_updates=420, lr=0.0025, gnorm=0.399, cli | epoch 012 | loss 11.813 | ppl 3598.49 | wps 2243 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 420 | lr 0.0025 | gnorm 0.399 | clip 100% | oom 0 | wall 921 | train_wall 472 | epoch 012 | valid on 'valid' subset | valid_loss 12.0768 | valid_ppl 4319.93 | num_updates 420 | best 12.0768 | epoch 013: 3%| | 1/35 [00:01<00:45, 1.35s/it, loss=12.124, ppl=4463.24, wps=72, ups=0.0, wpb=3472, bsz=304, num_updates=421, lr=0.0025, gnorm=0.634, clip=1| epoch 013: 6%| | 2/35 [00:02<00:42, 1.28s/it, loss=12.136, ppl=4500.06, wps=2048, ups=0.0, wpb=2889, bsz=252, num_updates=422, lr=0.0025, gnorm=0.450, clip| epoch 013: 9%| | 3/35 [00:03<00:39, 1.23s/it, loss=12.162, ppl=4581.99, wps=2077, ups=0.1, wpb=2716, bsz=240, num_updates=423, lr=0.0025, gnorm=0.420, clip| epoch 013: 11%| | 4/35 [00:04<00:38, 1.24s/it, loss=11.929, ppl=3898.13, wps=2406, ups=0.1, wpb=2980, bsz=244, num_updates=424, lr=0.0025, gnorm=0.370, clip| epoch 013: 14%|▏| 5/35 [00:06<00:38, 1.27s/it, loss=11.855, ppl=3705.42, wps=2535, ups=0.1, wpb=3148, bsz=251, num_updates=425, lr=0.0025, gnorm=0.362, clip| epoch 013: 17%|▏| 6/35 [00:07<00:36, 1.27s/it, loss=11.869, ppl=3739.45, wps=2528, ups=0.1, wpb=3160, bsz=253, num_updates=426, lr=0.0025, gnorm=0.398, clip| epoch 013: 20%|▏| 7/35 [00:08<00:31, 1.13s/it, loss=11.857, ppl=3710.43, wps=2487, ups=0.1, wpb=2953, bsz=235, num_updates=427, lr=0.0025, gnorm=0.395, clip| epoch 013: 23%|▏| 8/35 [00:09<00:30, 1.14s/it, loss=11.876, ppl=3757.51, wps=2452, ups=0.1, wpb=2914, bsz=220, num_updates=428, lr=0.0025, gnorm=0.385, clip| epoch 013: 26%|▎| 9/35 [00:10<00:29, 1.13s/it, loss=11.849, ppl=3688.34, wps=2381, ups=0.2, wpb=2816, bsz=209, num_updates=429, lr=0.0025, gnorm=0.383, clip| epoch 013: 29%|▎| 10/35 [00:11<00:27, 1.12s/it, loss=11.861, ppl=3718.70, wps=2271, ups=0.2, wpb=2681, bsz=199, num_updates=430, lr=0.0025, gnorm=0.388, cli| epoch 013: 31%|▎| 11/35 [00:12<00:27, 1.13s/it, loss=11.866, ppl=3733.87, wps=2260, ups=0.2, wpb=2669, bsz=193, num_updates=431, lr=0.0025, gnorm=0.381, cli| epoch 013: 34%|▎| 12/35 [00:14<00:26, 1.15s/it, loss=11.926, ppl=3889.92, wps=2224, ups=0.2, wpb=2635, bsz=182, num_updates=432, lr=0.0025, gnorm=0.386, cli| epoch 013: 37%|▎| 13/35 [00:15<00:24, 1.12s/it, loss=11.908, ppl=3843.84, wps=2251, ups=0.2, wpb=2642, bsz=184, num_updates=433, lr=0.0025, gnorm=0.375, cli| epoch 013: 40%|▍| 14/35 [00:16<00:24, 1.17s/it, loss=11.855, ppl=3704.28, wps=2302, ups=0.2, wpb=2712, bsz=189, num_updates=434, lr=0.0025, gnorm=0.364, cli| epoch 013: 43%|▍| 15/35 [00:17<00:23, 1.19s/it, loss=11.870, ppl=3742.96, wps=2306, ups=0.2, wpb=2723, bsz=186, num_updates=435, lr=0.0025, gnorm=0.357, cli| epoch 013: 46%|▍| 16/35 [00:18<00:22, 1.21s/it, loss=11.824, ppl=3626.75, wps=2346, ups=0.2, wpb=2777, bsz=188, num_updates=436, lr=0.0025, gnorm=0.361, cli| epoch 013: 49%|▍| 17/35 [00:19<00:20, 1.16s/it, loss=11.831, ppl=3642.25, wps=2262, ups=0.3, wpb=2665, bsz=182, num_updates=437, lr=0.0025, gnorm=0.378, cli| epoch 013: 51%|▌| 18/35 [00:21<00:19, 1.16s/it, loss=11.815, ppl=3604.07, wps=2254, ups=0.3, wpb=2658, bsz=179, num_updates=438, lr=0.0025, gnorm=0.374, cli| epoch 013: 54%|▌| 19/35 [00:21<00:16, 1.03s/it, loss=11.828, ppl=3634.75, wps=2222, ups=0.3, wpb=2568, bsz=171, num_updates=439, lr=0.0025, gnorm=0.389, cli| epoch 013: 57%|▌| 20/35 [00:22<00:15, 1.02s/it, loss=11.835, ppl=3653.76, wps=2168, ups=0.3, wpb=2493, bsz=167, num_updates=440, lr=0.0025, gnorm=0.401, cli| epoch 013: 60%|▌| 21/35 [00:23<00:14, 1.06s/it, loss=11.867, ppl=3735.86, wps=2174, ups=0.3, wpb=2499, bsz=171, num_updates=441, lr=0.0025, gnorm=0.399, cli| epoch 013: 63%|▋| 22/35 [00:25<00:14, 1.09s/it, loss=11.856, ppl=3707.08, wps=2199, ups=0.3, wpb=2526, bsz=172, num_updates=442, lr=0.0025, gnorm=0.404, cli| epoch 013: 66%|▋| 23/35 [00:26<00:13, 1.11s/it, loss=11.872, ppl=3748.19, wps=2196, ups=0.3, wpb=2526, bsz=169, num_updates=443, lr=0.0025, gnorm=0.400, cli| epoch 013: 69%|▋| 24/35 [00:27<00:12, 1.14s/it, loss=11.842, ppl=3672.33, wps=2214, ups=0.3, wpb=2549, bsz=169, num_updates=444, lr=0.0025, gnorm=0.401, cli| epoch 013: 71%|▋| 25/35 [00:28<00:11, 1.18s/it, loss=11.852, ppl=3695.95, wps=2230, ups=0.3, wpb=2579, bsz=174, num_updates=445, lr=0.0025, gnorm=0.409, cli| epoch 013: 74%|▋| 26/35 [00:29<00:10, 1.15s/it, loss=11.853, ppl=3700.45, wps=2209, ups=0.3, wpb=2549, bsz=172, num_updates=446, lr=0.0025, gnorm=0.424, cli| epoch 013: 77%|▊| 27/35 [00:30<00:09, 1.13s/it, loss=11.839, ppl=3664.34, wps=2225, ups=0.3, wpb=2559, bsz=172, num_updates=447, lr=0.0025, gnorm=0.437, cli| epoch 013: 80%|▊| 28/35 [00:32<00:08, 1.15s/it, loss=11.848, ppl=3686.79, wps=2236, ups=0.4, wpb=2576, bsz=172, num_updates=448, lr=0.0025, gnorm=0.432, cli| epoch 013: 83%|▊| 29/35 [00:33<00:07, 1.20s/it, loss=11.837, ppl=3657.49, wps=2262, ups=0.4, wpb=2617, bsz=175, num_updates=449, lr=0.0025, gnorm=0.425, cli| epoch 013: 86%|▊| 30/35 [00:34<00:05, 1.16s/it, loss=11.841, ppl=3669.03, wps=2233, ups=0.4, wpb=2577, bsz=174, num_updates=450, lr=0.0025, gnorm=0.424, cli| epoch 013: 89%|▉| 31/35 [00:35<00:04, 1.17s/it, loss=11.816, ppl=3605.14, wps=2246, ups=0.4, wpb=2594, bsz=174, num_updates=451, lr=0.0025, gnorm=0.420, cli| epoch 013: 91%|▉| 32/35 [00:36<00:03, 1.18s/it, loss=11.813, ppl=3599.03, wps=2266, ups=0.4, wpb=2620, bsz=177, num_updates=452, lr=0.0025, gnorm=0.414, cli| epoch 013: 94%|▉| 33/35 [00:38<00:02, 1.23s/it, loss=11.797, ppl=3557.61, wps=2281, ups=0.4, wpb=2650, bsz=179, num_updates=453, lr=0.0025, gnorm=0.409, cli| epoch 013: 97%|▉| 34/35 [00:39<00:01, 1.18s/it, loss=11.806, ppl=3580.64, wps=2250, ups=0.4, wpb=2608, bsz=177, num_updates=454, lr=0.0025, gnorm=0.417, cli| epoch 013: 100%|█| 35/35 [00:40<00:00, 1.12s/it, loss=11.806, ppl=3581.00, wps=2217, ups=0.4, wpb=2560, bsz=174, num_updates=455, lr=0.0025, gnorm=0.420, cli | epoch 013 | loss 11.806 | ppl 3581.00 | wps 2217 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 455 | lr 0.0025 | gnorm 0.420 | clip 100% | oom 0 | wall 1008 | train_wall 512 | epoch 013 | valid on 'valid' subset | valid_loss 12.077 | valid_ppl 4320.42 | num_updates 455 | best 12.0768 | epoch 014: 3%| | 1/35 [00:01<00:36, 1.08s/it, loss=11.557, ppl=3012.67, wps=58, ups=0.0, wpb=2035, bsz=120, num_updates=456, lr=0.00025, gnorm=0.406, clip=| epoch 014: 6%| | 2/35 [00:02<00:37, 1.14s/it, loss=11.785, ppl=3529.70, wps=2495, ups=0.1, wpb=2628, bsz=192, num_updates=457, lr=0.00025, gnorm=0.469, cli| epoch 014: 9%| | 3/35 [00:03<00:37, 1.16s/it, loss=11.571, ppl=3043.45, wps=2550, ups=0.1, wpb=2789, bsz=189, num_updates=458, lr=0.00025, gnorm=0.434, cli| epoch 014: 11%| | 4/35 [00:04<00:34, 1.10s/it, loss=11.597, ppl=3097.90, wps=2090, ups=0.1, wpb=2318, bsz=158, num_updates=459, lr=0.00025, gnorm=0.461, cli| epoch 014: 14%|▏| 5/35 [00:05<00:32, 1.09s/it, loss=11.576, ppl=3053.90, wps=2217, ups=0.1, wpb=2415, bsz=160, num_updates=460, lr=0.00025, gnorm=0.535, cli| epoch 014: 17%|▏| 6/35 [00:06<00:33, 1.16s/it, loss=11.588, ppl=3077.63, wps=2365, ups=0.1, wpb=2649, bsz=180, num_updates=461, lr=0.00025, gnorm=0.487, cli| epoch 014: 20%|▏| 7/35 [00:08<00:33, 1.19s/it, loss=11.534, ppl=2964.86, wps=2455, ups=0.2, wpb=2783, bsz=185, num_updates=462, lr=0.00025, gnorm=0.507, cli| epoch 014: 23%|▏| 8/35 [00:09<00:32, 1.21s/it, loss=11.504, ppl=2903.89, wps=2516, ups=0.2, wpb=2889, bsz=193, num_updates=463, lr=0.00025, gnorm=0.474, cli| epoch 014: 26%|▎| 9/35 [00:10<00:30, 1.19s/it, loss=11.569, ppl=3038.79, wps=2467, ups=0.2, wpb=2831, bsz=196, num_updates=464, lr=0.00025, gnorm=0.455, cli| epoch 014: 29%|▎| 10/35 [00:11<00:29, 1.17s/it, loss=11.617, ppl=3141.11, wps=2421, ups=0.2, wpb=2779, bsz=196, num_updates=465, lr=0.00025, gnorm=0.438, cl| epoch 014: 31%|▎| 11/35 [00:12<00:24, 1.03s/it, loss=11.645, ppl=3201.47, wps=2354, ups=0.2, wpb=2613, bsz=180, num_updates=466, lr=0.00025, gnorm=0.463, cl| epoch 014: 34%|▎| 12/35 [00:13<00:23, 1.04s/it, loss=11.665, ppl=3247.37, wps=2267, ups=0.3, wpb=2514, bsz=176, num_updates=467, lr=0.00025, gnorm=0.459, cl| epoch 014: 37%|▎| 13/35 [00:14<00:23, 1.09s/it, loss=11.737, ppl=3413.57, wps=2234, ups=0.3, wpb=2495, bsz=167, num_updates=468, lr=0.00025, gnorm=0.454, cl| epoch 014: 40%|▍| 14/35 [00:15<00:20, 1.00it/s, loss=11.738, ppl=3414.91, wps=2230, ups=0.3, wpb=2439, bsz=164, num_updates=469, lr=0.00025, gnorm=0.469, cl| epoch 014: 43%|▍| 15/35 [00:16<00:21, 1.05s/it, loss=11.772, ppl=3497.60, wps=2222, ups=0.3, wpb=2443, bsz=160, num_updates=470, lr=0.00025, gnorm=0.457, cl| epoch 014: 46%|▍| 16/35 [00:17<00:21, 1.11s/it, loss=11.793, ppl=3548.04, wps=2231, ups=0.3, wpb=2470, bsz=159, num_updates=471, lr=0.00025, gnorm=0.446, cl| epoch 014: 49%|▍| 17/35 [00:19<00:21, 1.18s/it, loss=11.816, ppl=3606.17, wps=2256, ups=0.3, wpb=2529, bsz=167, num_updates=472, lr=0.00025, gnorm=0.454, cl| epoch 014: 51%|▌| 18/35 [00:20<00:19, 1.13s/it, loss=11.825, ppl=3627.39, wps=2194, ups=0.3, wpb=2448, bsz=164, num_updates=473, lr=0.00025, gnorm=0.464, cl| epoch 014: 54%|▌| 19/35 [00:21<00:18, 1.14s/it, loss=11.835, ppl=3654.27, wps=2196, ups=0.3, wpb=2458, bsz=161, num_updates=474, lr=0.00025, gnorm=0.456, cl| epoch 014: 57%|▌| 20/35 [00:22<00:17, 1.19s/it, loss=11.819, ppl=3612.37, wps=2238, ups=0.4, wpb=2524, bsz=167, num_updates=475, lr=0.00025, gnorm=0.444, cl| epoch 014: 60%|▌| 21/35 [00:23<00:16, 1.19s/it, loss=11.823, ppl=3624.26, wps=2234, ups=0.4, wpb=2524, bsz=165, num_updates=476, lr=0.00025, gnorm=0.436, cl| epoch 014: 63%|▋| 22/35 [00:25<00:15, 1.22s/it, loss=11.834, ppl=3651.13, wps=2251, ups=0.4, wpb=2559, bsz=170, num_updates=477, lr=0.00025, gnorm=0.444, cl| epoch 014: 66%|▋| 23/35 [00:26<00:13, 1.16s/it, loss=11.839, ppl=3662.99, wps=2193, ups=0.4, wpb=2486, bsz=167, num_updates=478, lr=0.00025, gnorm=0.451, cl| epoch 014: 69%|▋| 24/35 [00:27<00:13, 1.19s/it, loss=11.813, ppl=3598.84, wps=2232, ups=0.4, wpb=2540, bsz=170, num_updates=479, lr=0.00025, gnorm=0.445, cl| epoch 014: 71%|▋| 25/35 [00:28<00:12, 1.23s/it, loss=11.792, ppl=3545.46, wps=2256, ups=0.4, wpb=2582, bsz=174, num_updates=480, lr=0.00025, gnorm=0.437, cl| epoch 014: 74%|▋| 26/35 [00:29<00:10, 1.19s/it, loss=11.798, ppl=3561.84, wps=2221, ups=0.4, wpb=2539, bsz=171, num_updates=481, lr=0.00025, gnorm=0.439, cl| epoch 014: 77%|▊| 27/35 [00:31<00:09, 1.18s/it, loss=11.793, ppl=3547.77, wps=2239, ups=0.4, wpb=2560, bsz=172, num_updates=482, lr=0.00025, gnorm=0.447, cl| epoch 014: 80%|▊| 28/35 [00:32<00:08, 1.19s/it, loss=11.790, ppl=3542.14, wps=2262, ups=0.4, wpb=2591, bsz=176, num_updates=483, lr=0.00025, gnorm=0.439, cl| epoch 014: 83%|▊| 29/35 [00:33<00:07, 1.19s/it, loss=11.783, ppl=3523.36, wps=2257, ups=0.4, wpb=2589, bsz=174, num_updates=484, lr=0.00025, gnorm=0.436, cl| epoch 014: 86%|▊| 30/35 [00:34<00:05, 1.19s/it, loss=11.794, ppl=3549.95, wps=2266, ups=0.4, wpb=2603, bsz=174, num_updates=485, lr=0.00025, gnorm=0.432, cl| epoch 014: 89%|▉| 31/35 [00:35<00:04, 1.18s/it, loss=11.815, ppl=3604.17, wps=2267, ups=0.4, wpb=2604, bsz=176, num_updates=486, lr=0.00025, gnorm=0.428, cl| epoch 014: 91%|▉| 32/35 [00:36<00:03, 1.18s/it, loss=11.796, ppl=3555.36, wps=2279, ups=0.5, wpb=2619, bsz=176, num_updates=487, lr=0.00025, gnorm=0.432, cl| epoch 014: 94%|▉| 33/35 [00:38<00:02, 1.14s/it, loss=11.793, ppl=3549.22, wps=2287, ups=0.5, wpb=2622, bsz=177, num_updates=488, lr=0.00025, gnorm=0.428, cl| epoch 014: 97%|▉| 34/35 [00:39<00:01, 1.12s/it, loss=11.803, ppl=3572.28, wps=2255, ups=0.5, wpb=2581, bsz=175, num_updates=489, lr=0.00025, gnorm=0.435, cl| epoch 014: 100%|█| 35/35 [00:40<00:00, 1.11s/it, loss=11.805, ppl=3578.80, wps=2239, ups=0.5, wpb=2560, bsz=174, num_updates=490, lr=0.00025, gnorm=0.448, cl | epoch 014 | loss 11.805 | ppl 3578.80 | wps 2239 | ups 0.5 | wpb 2560 | bsz 174 | num_updates 490 | lr 0.00025 | gnorm 0.448 | clip 100% | oom 0 | wall 1082 | train_wall 551 | epoch 014 | valid on 'valid' subset | valid_loss 12.0756 | valid_ppl 4316.30 | num_updates 490 | best 12.0756 | epoch 015: 3%| | 1/35 [00:01<00:40, 1.18s/it, loss=12.015, ppl=4137.83, wps=60, ups=0.0, wpb=2637, bsz=112, num_updates=491, lr=0.00025, gnorm=0.277, clip=| epoch 015: 6%| | 2/35 [00:02<00:39, 1.19s/it, loss=11.873, ppl=3750.40, wps=2825, ups=0.0, wpb=3028, bsz=188, num_updates=492, lr=0.00025, gnorm=0.243, cli| epoch 015: 9%| | 3/35 [00:03<00:38, 1.19s/it, loss=11.669, ppl=3256.71, wps=2714, ups=0.1, wpb=3048, bsz=184, num_updates=493, lr=0.00025, gnorm=0.338, cli| epoch 015: 11%| | 4/35 [00:04<00:36, 1.19s/it, loss=11.644, ppl=3200.19, wps=2520, ups=0.1, wpb=2919, bsz=172, num_updates=494, lr=0.00025, gnorm=0.344, cli| epoch 015: 14%|▏| 5/35 [00:05<00:34, 1.15s/it, loss=11.691, ppl=3305.65, wps=2249, ups=0.1, wpb=2619, bsz=163, num_updates=495, lr=0.00025, gnorm=0.356, cli| epoch 015: 17%|▏| 6/35 [00:06<00:33, 1.15s/it, loss=11.728, ppl=3392.97, wps=2237, ups=0.1, wpb=2606, bsz=157, num_updates=496, lr=0.00025, gnorm=0.342, cli| epoch 015: 20%|▏| 7/35 [00:08<00:31, 1.14s/it, loss=11.758, ppl=3463.36, wps=2096, ups=0.1, wpb=2444, bsz=151, num_updates=497, lr=0.00025, gnorm=0.357, cli| epoch 015: 23%|▏| 8/35 [00:09<00:31, 1.17s/it, loss=11.694, ppl=3312.25, wps=2236, ups=0.2, wpb=2610, bsz=164, num_updates=498, lr=0.00025, gnorm=0.354, cli| epoch 015: 26%|▎| 9/35 [00:10<00:29, 1.13s/it, loss=11.710, ppl=3350.89, wps=2079, ups=0.2, wpb=2418, bsz=156, num_updates=499, lr=0.00025, gnorm=0.381, cli| epoch 015: 29%|▎| 10/35 [00:11<00:27, 1.12s/it, loss=11.726, ppl=3388.18, wps=2036, ups=0.2, wpb=2358, bsz=153, num_updates=500, lr=0.00025, gnorm=0.432, cl| epoch 015: 31%|▎| 11/35 [00:12<00:24, 1.02s/it, loss=11.727, ppl=3390.54, wps=2046, ups=0.2, wpb=2299, bsz=151, num_updates=501, lr=0.00025, gnorm=0.445, cl| epoch 015: 34%|▎| 12/35 [00:13<00:24, 1.06s/it, loss=11.723, ppl=3380.58, wps=2106, ups=0.2, wpb=2366, bsz=155, num_updates=502, lr=0.00025, gnorm=0.465, cl| epoch 015: 37%|▎| 13/35 [00:14<00:24, 1.13s/it, loss=11.680, ppl=3282.15, wps=2175, ups=0.2, wpb=2463, bsz=162, num_updates=503, lr=0.00025, gnorm=0.449, cl| epoch 015: 40%|▍| 14/35 [00:15<00:23, 1.12s/it, loss=11.672, ppl=3262.23, wps=2152, ups=0.2, wpb=2433, bsz=159, num_updates=504, lr=0.00025, gnorm=0.445, cl| epoch 015: 43%|▍| 15/35 [00:16<00:22, 1.10s/it, loss=11.699, ppl=3324.92, wps=2084, ups=0.3, wpb=2353, bsz=156, num_updates=505, lr=0.00025, gnorm=0.460, cl| epoch 015: 46%|▍| 16/35 [00:17<00:18, 1.01it/s, loss=11.719, ppl=3370.87, wps=2051, ups=0.3, wpb=2266, bsz=147, num_updates=506, lr=0.00025, gnorm=0.480, cl| epoch 015: 49%|▍| 17/35 [00:18<00:19, 1.09s/it, loss=11.691, ppl=3305.89, wps=2101, ups=0.3, wpb=2345, bsz=153, num_updates=507, lr=0.00025, gnorm=0.465, cl| epoch 015: 51%|▌| 18/35 [00:20<00:19, 1.15s/it, loss=11.714, ppl=3360.12, wps=2132, ups=0.3, wpb=2397, bsz=160, num_updates=508, lr=0.00025, gnorm=0.473, cl| epoch 015: 54%|▌| 19/35 [00:21<00:18, 1.16s/it, loss=11.680, ppl=3281.80, wps=2161, ups=0.3, wpb=2434, bsz=161, num_updates=509, lr=0.00025, gnorm=0.466, cl| epoch 015: 57%|▌| 20/35 [00:22<00:17, 1.15s/it, loss=11.702, ppl=3331.96, wps=2154, ups=0.3, wpb=2428, bsz=163, num_updates=510, lr=0.00025, gnorm=0.457, cl| epoch 015: 60%|▌| 21/35 [00:23<00:16, 1.20s/it, loss=11.696, ppl=3316.71, wps=2196, ups=0.3, wpb=2492, bsz=169, num_updates=511, lr=0.00025, gnorm=0.445, cl| epoch 015: 63%|▋| 22/35 [00:25<00:15, 1.20s/it, loss=11.737, ppl=3412.26, wps=2180, ups=0.3, wpb=2482, bsz=164, num_updates=512, lr=0.00025, gnorm=0.440, cl| epoch 015: 66%|▋| 23/35 [00:26<00:14, 1.21s/it, loss=11.754, ppl=3453.73, wps=2195, ups=0.3, wpb=2505, bsz=164, num_updates=513, lr=0.00025, gnorm=0.434, cl| epoch 015: 69%|▋| 24/35 [00:27<00:13, 1.22s/it, loss=11.729, ppl=3393.99, wps=2227, ups=0.3, wpb=2550, bsz=166, num_updates=514, lr=0.00025, gnorm=0.444, cl| epoch 015: 71%|▋| 25/35 [00:28<00:12, 1.24s/it, loss=11.738, ppl=3415.81, wps=2240, ups=0.3, wpb=2577, bsz=170, num_updates=515, lr=0.00025, gnorm=0.447, cl| epoch 015: 74%|▋| 26/35 [00:30<00:11, 1.27s/it, loss=11.731, ppl=3400.17, wps=2269, ups=0.4, wpb=2625, bsz=174, num_updates=516, lr=0.00025, gnorm=0.439, cl| epoch 015: 77%|▊| 27/35 [00:31<00:09, 1.18s/it, loss=11.733, ppl=3403.26, wps=2225, ups=0.4, wpb=2561, bsz=170, num_updates=517, lr=0.00025, gnorm=0.443, cl| epoch 015: 80%|▊| 28/35 [00:32<00:08, 1.14s/it, loss=11.724, ppl=3383.17, wps=2239, ups=0.4, wpb=2570, bsz=170, num_updates=518, lr=0.00025, gnorm=0.457, cl| epoch 015: 83%|▊| 29/35 [00:33<00:06, 1.14s/it, loss=11.751, ppl=3445.83, wps=2242, ups=0.4, wpb=2572, bsz=173, num_updates=519, lr=0.00025, gnorm=0.452, cl| epoch 015: 86%|▊| 30/35 [00:34<00:06, 1.20s/it, loss=11.765, ppl=3481.26, wps=2255, ups=0.4, wpb=2602, bsz=177, num_updates=520, lr=0.00025, gnorm=0.456, cl| epoch 015: 89%|▉| 31/35 [00:35<00:04, 1.20s/it, loss=11.780, ppl=3517.82, wps=2251, ups=0.4, wpb=2599, bsz=175, num_updates=521, lr=0.00025, gnorm=0.451, cl| epoch 015: 91%|▉| 32/35 [00:37<00:03, 1.20s/it, loss=11.790, ppl=3540.50, wps=2254, ups=0.4, wpb=2607, bsz=174, num_updates=522, lr=0.00025, gnorm=0.445, cl| epoch 015: 94%|▉| 33/35 [00:38<00:02, 1.14s/it, loss=11.794, ppl=3552.19, wps=2222, ups=0.4, wpb=2560, bsz=171, num_updates=523, lr=0.00025, gnorm=0.451, cl| epoch 015: 97%|▉| 34/35 [00:39<00:01, 1.14s/it, loss=11.805, ppl=3579.18, wps=2218, ups=0.4, wpb=2555, bsz=173, num_updates=524, lr=0.00025, gnorm=0.445, cl| epoch 015: 100%|█| 35/35 [00:40<00:00, 1.12s/it, loss=11.803, ppl=3572.80, wps=2228, ups=0.4, wpb=2560, bsz=174, num_updates=525, lr=0.00025, gnorm=0.441, cl | epoch 015 | loss 11.803 | ppl 3572.80 | wps 2228 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 525 | lr 0.00025 | gnorm 0.441 | clip 100% | oom 0 | wall 1165 | train_wall 591 | epoch 015 | valid on 'valid' subset | valid_loss 12.0722 | valid_ppl 4306.28 | num_updates 525 | best 12.0722 | epoch 016: 3%| | 1/35 [00:01<00:39, 1.16s/it, loss=11.915, ppl=3862.64, wps=59, ups=0.0, wpb=2543, bsz=128, num_updates=526, lr=0.00025, gnorm=0.265, clip=| epoch 016: 6%| | 2/35 [00:02<00:38, 1.16s/it, loss=11.959, ppl=3981.80, wps=2241, ups=0.0, wpb=2590, bsz=120, num_updates=527, lr=0.00025, gnorm=0.265, cli| epoch 016: 9%| | 3/35 [00:03<00:37, 1.17s/it, loss=11.823, ppl=3623.65, wps=2182, ups=0.1, wpb=2570, bsz=125, num_updates=528, lr=0.00025, gnorm=0.278, cli| epoch 016: 11%| | 4/35 [00:04<00:35, 1.16s/it, loss=11.988, ppl=4061.37, wps=2224, ups=0.1, wpb=2582, bsz=156, num_updates=529, lr=0.00025, gnorm=0.290, cli| epoch 016: 14%|▏| 5/35 [00:05<00:34, 1.14s/it, loss=11.979, ppl=4037.20, wps=2097, ups=0.1, wpb=2429, bsz=150, num_updates=530, lr=0.00025, gnorm=0.408, cli| epoch 016: 17%|▏| 6/35 [00:06<00:33, 1.15s/it, loss=11.838, ppl=3660.51, wps=2201, ups=0.1, wpb=2540, bsz=155, num_updates=531, lr=0.00025, gnorm=0.423, cli| epoch 016: 20%|▏| 7/35 [00:07<00:31, 1.12s/it, loss=11.820, ppl=3614.80, wps=2257, ups=0.1, wpb=2565, bsz=162, num_updates=532, lr=0.00025, gnorm=0.403, cli| epoch 016: 23%|▏| 8/35 [00:08<00:29, 1.08s/it, loss=11.819, ppl=3613.13, wps=2090, ups=0.2, wpb=2358, bsz=150, num_updates=533, lr=0.00025, gnorm=0.417, cli| epoch 016: 26%|▎| 9/35 [00:10<00:29, 1.12s/it, loss=11.853, ppl=3699.06, wps=2145, ups=0.2, wpb=2432, bsz=151, num_updates=534, lr=0.00025, gnorm=0.405, cli| epoch 016: 29%|▎| 10/35 [00:11<00:29, 1.17s/it, loss=11.860, ppl=3717.52, wps=2190, ups=0.2, wpb=2511, bsz=162, num_updates=535, lr=0.00025, gnorm=0.414, cl| epoch 016: 31%|▎| 11/35 [00:12<00:28, 1.20s/it, loss=11.795, ppl=3554.35, wps=2263, ups=0.2, wpb=2613, bsz=170, num_updates=536, lr=0.00025, gnorm=0.398, cl| epoch 016: 34%|▎| 12/35 [00:13<00:27, 1.20s/it, loss=11.818, ppl=3611.46, wps=2273, ups=0.2, wpb=2634, bsz=168, num_updates=537, lr=0.00025, gnorm=0.385, cl| epoch 016: 37%|▎| 13/35 [00:15<00:25, 1.17s/it, loss=11.831, ppl=3642.31, wps=2200, ups=0.2, wpb=2545, bsz=164, num_updates=538, lr=0.00025, gnorm=0.389, cl| epoch 016: 40%|▍| 14/35 [00:16<00:24, 1.17s/it, loss=11.817, ppl=3608.08, wps=2236, ups=0.2, wpb=2584, bsz=166, num_updates=539, lr=0.00025, gnorm=0.404, cl| epoch 016: 43%|▍| 15/35 [00:17<00:23, 1.18s/it, loss=11.868, ppl=3738.80, wps=2210, ups=0.3, wpb=2563, bsz=159, num_updates=540, lr=0.00025, gnorm=0.400, cl| epoch 016: 46%|▍| 16/35 [00:18<00:19, 1.04s/it, loss=11.882, ppl=3775.15, wps=2173, ups=0.3, wpb=2463, bsz=151, num_updates=541, lr=0.00025, gnorm=0.419, cl| epoch 016: 49%|▍| 17/35 [00:18<00:17, 1.04it/s, loss=11.876, ppl=3759.91, wps=2174, ups=0.3, wpb=2419, bsz=149, num_updates=542, lr=0.00025, gnorm=0.428, cl| epoch 016: 51%|▌| 18/35 [00:20<00:17, 1.01s/it, loss=11.890, ppl=3795.98, wps=2166, ups=0.3, wpb=2412, bsz=152, num_updates=543, lr=0.00025, gnorm=0.418, cl| epoch 016: 54%|▌| 19/35 [00:21<00:17, 1.10s/it, loss=11.868, ppl=3738.83, wps=2212, ups=0.3, wpb=2484, bsz=159, num_updates=544, lr=0.00025, gnorm=0.407, cl| epoch 016: 57%|▌| 20/35 [00:22<00:17, 1.17s/it, loss=11.883, ppl=3777.00, wps=2235, ups=0.3, wpb=2533, bsz=166, num_updates=545, lr=0.00025, gnorm=0.416, cl| epoch 016: 60%|▌| 21/35 [00:23<00:15, 1.13s/it, loss=11.887, ppl=3787.35, wps=2171, ups=0.3, wpb=2455, bsz=162, num_updates=546, lr=0.00025, gnorm=0.424, cl| epoch 016: 63%|▋| 22/35 [00:25<00:15, 1.19s/it, loss=11.869, ppl=3739.85, wps=2210, ups=0.3, wpb=2517, bsz=168, num_updates=547, lr=0.00025, gnorm=0.414, cl| epoch 016: 66%|▋| 23/35 [00:26<00:14, 1.17s/it, loss=11.883, ppl=3776.05, wps=2205, ups=0.3, wpb=2510, bsz=170, num_updates=548, lr=0.00025, gnorm=0.409, cl| epoch 016: 69%|▋| 24/35 [00:27<00:12, 1.15s/it, loss=11.871, ppl=3745.14, wps=2191, ups=0.3, wpb=2490, bsz=168, num_updates=549, lr=0.00025, gnorm=0.407, cl| epoch 016: 71%|▋| 25/35 [00:28<00:11, 1.12s/it, loss=11.883, ppl=3776.24, wps=2151, ups=0.4, wpb=2440, bsz=166, num_updates=550, lr=0.00025, gnorm=0.417, cl| epoch 016: 74%|▋| 26/35 [00:29<00:10, 1.16s/it, loss=11.850, ppl=3692.39, wps=2183, ups=0.4, wpb=2484, bsz=168, num_updates=551, lr=0.00025, gnorm=0.424, cl| epoch 016: 77%|▊| 27/35 [00:30<00:09, 1.18s/it, loss=11.846, ppl=3680.40, wps=2209, ups=0.4, wpb=2519, bsz=171, num_updates=552, lr=0.00025, gnorm=0.416, cl| epoch 016: 80%|▊| 28/35 [00:31<00:07, 1.12s/it, loss=11.851, ppl=3693.21, wps=2171, ups=0.4, wpb=2467, bsz=169, num_updates=553, lr=0.00025, gnorm=0.423, cl| epoch 016: 83%|▊| 29/35 [00:32<00:06, 1.14s/it, loss=11.865, ppl=3729.08, wps=2169, ups=0.4, wpb=2469, bsz=166, num_updates=554, lr=0.00025, gnorm=0.417, cl| epoch 016: 86%|▊| 30/35 [00:34<00:05, 1.18s/it, loss=11.870, ppl=3744.10, wps=2185, ups=0.4, wpb=2496, bsz=170, num_updates=555, lr=0.00025, gnorm=0.423, cl| epoch 016: 89%|▉| 31/35 [00:35<00:04, 1.15s/it, loss=11.857, ppl=3709.97, wps=2198, ups=0.4, wpb=2506, bsz=170, num_updates=556, lr=0.00025, gnorm=0.434, cl| epoch 016: 91%|▉| 32/35 [00:36<00:03, 1.16s/it, loss=11.832, ppl=3644.79, wps=2213, ups=0.4, wpb=2524, bsz=170, num_updates=557, lr=0.00025, gnorm=0.430, cl| epoch 016: 94%|▉| 33/35 [00:37<00:02, 1.21s/it, loss=11.814, ppl=3599.41, wps=2230, ups=0.4, wpb=2557, bsz=173, num_updates=558, lr=0.00025, gnorm=0.424, cl| epoch 016: 97%|▉| 34/35 [00:38<00:01, 1.17s/it, loss=11.818, ppl=3611.20, wps=2205, ups=0.4, wpb=2524, bsz=171, num_updates=559, lr=0.00025, gnorm=0.423, cl| epoch 016: 100%|█| 35/35 [00:40<00:00, 1.20s/it, loss=11.802, ppl=3570.74, wps=2230, ups=0.4, wpb=2560, bsz=174, num_updates=560, lr=0.00025, gnorm=0.419, cl | epoch 016 | loss 11.802 | ppl 3570.74 | wps 2230 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 560 | lr 0.00025 | gnorm 0.419 | clip 100% | oom 0 | wall 1247 | train_wall 630 | epoch 016 | valid on 'valid' subset | valid_loss 12.0691 | valid_ppl 4297.00 | num_updates 560 | best 12.0691 | epoch 017: 3%| | 1/35 [00:01<00:40, 1.18s/it, loss=11.272, ppl=2472.38, wps=72, ups=0.0, wpb=3090, bsz=176, num_updates=561, lr=0.00025, gnorm=0.495, clip=| epoch 017: 6%| | 2/35 [00:02<00:37, 1.15s/it, loss=11.612, ppl=3130.06, wps=1158, ups=0.0, wpb=2164, bsz=144, num_updates=562, lr=0.00025, gnorm=0.575, cli| epoch 017: 9%| | 3/35 [00:03<00:37, 1.16s/it, loss=11.437, ppl=2771.89, wps=1925, ups=0.1, wpb=2479, bsz=157, num_updates=563, lr=0.00025, gnorm=0.488, cli| epoch 017: 11%| | 4/35 [00:04<00:37, 1.20s/it, loss=11.578, ppl=3058.23, wps=2133, ups=0.1, wpb=2665, bsz=184, num_updates=564, lr=0.00025, gnorm=0.496, cli| epoch 017: 14%|▏| 5/35 [00:06<00:37, 1.24s/it, loss=11.586, ppl=3074.06, wps=2330, ups=0.1, wpb=2896, bsz=203, num_updates=565, lr=0.00025, gnorm=0.443, cli| epoch 017: 17%|▏| 6/35 [00:07<00:35, 1.23s/it, loss=11.621, ppl=3149.49, wps=2427, ups=0.1, wpb=2983, bsz=213, num_updates=566, lr=0.00025, gnorm=0.406, cli| epoch 017: 20%|▏| 7/35 [00:08<00:33, 1.18s/it, loss=11.607, ppl=3118.29, wps=2458, ups=0.1, wpb=2957, bsz=207, num_updates=567, lr=0.00025, gnorm=0.460, cli| epoch 017: 23%|▏| 8/35 [00:09<00:32, 1.19s/it, loss=11.665, ppl=3247.44, wps=2464, ups=0.2, wpb=2966, bsz=201, num_updates=568, lr=0.00025, gnorm=0.437, cli| epoch 017: 26%|▎| 9/35 [00:10<00:30, 1.18s/it, loss=11.745, ppl=3433.11, wps=2443, ups=0.2, wpb=2927, bsz=206, num_updates=569, lr=0.00025, gnorm=0.424, cli| epoch 017: 29%|▎| 10/35 [00:11<00:29, 1.19s/it, loss=11.775, ppl=3504.12, wps=2433, ups=0.2, wpb=2922, bsz=200, num_updates=570, lr=0.00025, gnorm=0.408, cl| epoch 017: 31%|▎| 11/35 [00:13<00:28, 1.19s/it, loss=11.794, ppl=3550.08, wps=2414, ups=0.2, wpb=2896, bsz=192, num_updates=571, lr=0.00025, gnorm=0.394, cl| epoch 017: 34%|▎| 12/35 [00:14<00:27, 1.21s/it, loss=11.813, ppl=3598.03, wps=2429, ups=0.2, wpb=2928, bsz=199, num_updates=572, lr=0.00025, gnorm=0.410, cl| epoch 017: 37%|▎| 13/35 [00:15<00:25, 1.17s/it, loss=11.823, ppl=3622.94, wps=2347, ups=0.2, wpb=2812, bsz=194, num_updates=573, lr=0.00025, gnorm=0.410, cl| epoch 017: 40%|▍| 14/35 [00:16<00:24, 1.17s/it, loss=11.806, ppl=3581.81, wps=2330, ups=0.2, wpb=2792, bsz=190, num_updates=574, lr=0.00025, gnorm=0.402, cl| epoch 017: 43%|▍| 15/35 [00:17<00:24, 1.23s/it, loss=11.829, ppl=3637.05, wps=2350, ups=0.3, wpb=2837, bsz=197, num_updates=575, lr=0.00025, gnorm=0.414, cl| epoch 017: 46%|▍| 16/35 [00:18<00:20, 1.09s/it, loss=11.825, ppl=3627.03, wps=2342, ups=0.3, wpb=2767, bsz=193, num_updates=576, lr=0.00025, gnorm=0.420, cl| epoch 017: 49%|▍| 17/35 [00:19<00:19, 1.10s/it, loss=11.844, ppl=3675.65, wps=2328, ups=0.3, wpb=2743, bsz=194, num_updates=577, lr=0.00025, gnorm=0.412, cl| epoch 017: 51%|▌| 18/35 [00:21<00:19, 1.17s/it, loss=11.813, ppl=3596.82, wps=2354, ups=0.3, wpb=2791, bsz=197, num_updates=578, lr=0.00025, gnorm=0.403, cl| epoch 017: 54%|▌| 19/35 [00:21<00:16, 1.03s/it, loss=11.824, ppl=3626.10, wps=2319, ups=0.3, wpb=2695, bsz=188, num_updates=579, lr=0.00025, gnorm=0.415, cl| epoch 017: 57%|▌| 20/35 [00:22<00:15, 1.02s/it, loss=11.831, ppl=3643.99, wps=2261, ups=0.3, wpb=2613, bsz=184, num_updates=580, lr=0.00025, gnorm=0.425, cl| epoch 017: 60%|▌| 21/35 [00:24<00:14, 1.06s/it, loss=11.822, ppl=3621.49, wps=2282, ups=0.3, wpb=2637, bsz=185, num_updates=581, lr=0.00025, gnorm=0.434, cl| epoch 017: 63%|▋| 22/35 [00:25<00:13, 1.07s/it, loss=11.813, ppl=3596.93, wps=2263, ups=0.3, wpb=2609, bsz=182, num_updates=582, lr=0.00025, gnorm=0.430, cl| epoch 017: 66%|▋| 23/35 [00:26<00:13, 1.12s/it, loss=11.782, ppl=3521.39, wps=2294, ups=0.3, wpb=2652, bsz=183, num_updates=583, lr=0.00025, gnorm=0.437, cl| epoch 017: 69%|▋| 24/35 [00:27<00:12, 1.14s/it, loss=11.787, ppl=3533.45, wps=2288, ups=0.3, wpb=2647, bsz=181, num_updates=584, lr=0.00025, gnorm=0.429, cl| epoch 017: 71%|▋| 25/35 [00:28<00:11, 1.19s/it, loss=11.777, ppl=3508.96, wps=2316, ups=0.4, wpb=2692, bsz=185, num_updates=585, lr=0.00025, gnorm=0.421, cl| epoch 017: 74%|▋| 26/35 [00:30<00:10, 1.21s/it, loss=11.758, ppl=3462.83, wps=2346, ups=0.4, wpb=2734, bsz=188, num_updates=586, lr=0.00025, gnorm=0.415, cl| epoch 017: 77%|▊| 27/35 [00:31<00:09, 1.18s/it, loss=11.764, ppl=3478.38, wps=2309, ups=0.4, wpb=2687, bsz=185, num_updates=587, lr=0.00025, gnorm=0.414, cl| epoch 017: 80%|▊| 28/35 [00:32<00:08, 1.18s/it, loss=11.792, ppl=3546.28, wps=2293, ups=0.4, wpb=2672, bsz=180, num_updates=588, lr=0.00025, gnorm=0.411, cl| epoch 017: 83%|▊| 29/35 [00:33<00:06, 1.14s/it, loss=11.796, ppl=3555.90, wps=2247, ups=0.4, wpb=2610, bsz=177, num_updates=589, lr=0.00025, gnorm=0.417, cl| epoch 017: 86%|▊| 30/35 [00:34<00:05, 1.15s/it, loss=11.810, ppl=3590.50, wps=2242, ups=0.4, wpb=2607, bsz=174, num_updates=590, lr=0.00025, gnorm=0.412, cl| epoch 017: 89%|▉| 31/35 [00:35<00:04, 1.10s/it, loss=11.810, ppl=3591.34, wps=2205, ups=0.4, wpb=2552, bsz=171, num_updates=591, lr=0.00025, gnorm=0.415, cl| epoch 017: 91%|▉| 32/35 [00:36<00:03, 1.09s/it, loss=11.813, ppl=3597.91, wps=2189, ups=0.4, wpb=2529, bsz=170, num_updates=592, lr=0.00025, gnorm=0.428, cl| epoch 017: 94%|▉| 33/35 [00:37<00:02, 1.11s/it, loss=11.822, ppl=3620.02, wps=2184, ups=0.4, wpb=2522, bsz=171, num_updates=593, lr=0.00025, gnorm=0.422, cl| epoch 017: 97%|▉| 34/35 [00:38<00:01, 1.09s/it, loss=11.818, ppl=3611.01, wps=2195, ups=0.4, wpb=2528, bsz=172, num_updates=594, lr=0.00025, gnorm=0.418, cl| epoch 017: 100%|█| 35/35 [00:40<00:00, 1.15s/it, loss=11.799, ppl=3563.53, wps=2216, ups=0.4, wpb=2560, bsz=174, num_updates=595, lr=0.00025, gnorm=0.413, cl | epoch 017 | loss 11.799 | ppl 3563.53 | wps 2216 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 595 | lr 0.00025 | gnorm 0.413 | clip 100% | oom 0 | wall 1329 | train_wall 669 | epoch 017 | valid on 'valid' subset | valid_loss 12.0672 | valid_ppl 4291.21 | num_updates 595 | best 12.0672 | epoch 018: 3%| | 1/35 [00:01<00:38, 1.14s/it, loss=12.482, ppl=5719.40, wps=63, ups=0.0, wpb=2616, bsz=248, num_updates=596, lr=0.00025, gnorm=0.348, clip=| epoch 018: 6%| | 2/35 [00:02<00:36, 1.12s/it, loss=12.245, ppl=4852.49, wps=1681, ups=0.0, wpb=2218, bsz=188, num_updates=597, lr=0.00025, gnorm=0.538, cli| epoch 018: 9%| | 3/35 [00:03<00:32, 1.02s/it, loss=12.104, ppl=4401.20, wps=1892, ups=0.1, wpb=2049, bsz=168, num_updates=598, lr=0.00025, gnorm=0.539, cli| epoch 018: 11%| | 4/35 [00:04<00:32, 1.04s/it, loss=12.095, ppl=4373.35, wps=1686, ups=0.1, wpb=1904, bsz=154, num_updates=599, lr=0.00025, gnorm=0.506, cli| epoch 018: 14%|▏| 5/35 [00:05<00:32, 1.09s/it, loss=11.959, ppl=3981.84, wps=1811, ups=0.1, wpb=2029, bsz=150, num_updates=600, lr=0.00025, gnorm=0.468, cli| epoch 018: 17%|▏| 6/35 [00:06<00:31, 1.08s/it, loss=11.975, ppl=4025.84, wps=1716, ups=0.1, wpb=1928, bsz=147, num_updates=601, lr=0.00025, gnorm=0.462, cli| epoch 018: 20%|▏| 7/35 [00:07<00:31, 1.11s/it, loss=11.811, ppl=3594.14, wps=1884, ups=0.1, wpb=2096, bsz=152, num_updates=602, lr=0.00025, gnorm=0.436, cli| epoch 018: 23%|▏| 8/35 [00:08<00:31, 1.16s/it, loss=11.844, ppl=3676.70, wps=1997, ups=0.2, wpb=2245, bsz=168, num_updates=603, lr=0.00025, gnorm=0.454, cli| epoch 018: 26%|▎| 9/35 [00:10<00:30, 1.17s/it, loss=11.939, ppl=3927.43, wps=1982, ups=0.2, wpb=2247, bsz=156, num_updates=604, lr=0.00025, gnorm=0.440, cli| epoch 018: 29%|▎| 10/35 [00:10<00:25, 1.04s/it, loss=11.963, ppl=3991.78, wps=1934, ups=0.2, wpb=2118, bsz=143, num_updates=605, lr=0.00025, gnorm=0.463, cl| epoch 018: 31%|▎| 11/35 [00:11<00:25, 1.05s/it, loss=11.926, ppl=3891.71, wps=1928, ups=0.2, wpb=2111, bsz=141, num_updates=606, lr=0.00025, gnorm=0.450, cl| epoch 018: 34%|▎| 12/35 [00:12<00:23, 1.04s/it, loss=11.937, ppl=3920.67, wps=1854, ups=0.2, wpb=2024, bsz=138, num_updates=607, lr=0.00025, gnorm=0.465, cl| epoch 018: 37%|▎| 13/35 [00:14<00:24, 1.13s/it, loss=11.893, ppl=3802.78, wps=1957, ups=0.2, wpb=2162, bsz=149, num_updates=608, lr=0.00025, gnorm=0.448, cl| epoch 018: 40%|▍| 14/35 [00:15<00:23, 1.13s/it, loss=11.869, ppl=3741.46, wps=2017, ups=0.3, wpb=2229, bsz=152, num_updates=609, lr=0.00025, gnorm=0.455, cl| epoch 018: 43%|▍| 15/35 [00:16<00:23, 1.20s/it, loss=11.892, ppl=3799.50, wps=2065, ups=0.3, wpb=2312, bsz=162, num_updates=610, lr=0.00025, gnorm=0.464, cl| epoch 018: 46%|▍| 16/35 [00:17<00:21, 1.15s/it, loss=11.897, ppl=3814.85, wps=1989, ups=0.3, wpb=2222, bsz=158, num_updates=611, lr=0.00025, gnorm=0.473, cl| epoch 018: 49%|▍| 17/35 [00:18<00:20, 1.16s/it, loss=11.847, ppl=3684.99, wps=2031, ups=0.3, wpb=2273, bsz=159, num_updates=612, lr=0.00025, gnorm=0.471, cl| epoch 018: 51%|▌| 18/35 [00:19<00:19, 1.13s/it, loss=11.824, ppl=3626.23, wps=2064, ups=0.3, wpb=2303, bsz=159, num_updates=613, lr=0.00025, gnorm=0.485, cl| epoch 018: 54%|▌| 19/35 [00:21<00:18, 1.17s/it, loss=11.783, ppl=3523.65, wps=2115, ups=0.3, wpb=2370, bsz=162, num_updates=614, lr=0.00025, gnorm=0.490, cl| epoch 018: 57%|▌| 20/35 [00:22<00:17, 1.19s/it, loss=11.755, ppl=3455.45, wps=2167, ups=0.3, wpb=2441, bsz=167, num_updates=615, lr=0.00025, gnorm=0.479, cl| epoch 018: 60%|▌| 21/35 [00:23<00:16, 1.20s/it, loss=11.755, ppl=3455.34, wps=2202, ups=0.3, wpb=2487, bsz=172, num_updates=616, lr=0.00025, gnorm=0.468, cl| epoch 018: 63%|▋| 22/35 [00:24<00:15, 1.16s/it, loss=11.771, ppl=3494.78, wps=2155, ups=0.3, wpb=2430, bsz=169, num_updates=617, lr=0.00025, gnorm=0.474, cl| epoch 018: 66%|▋| 23/35 [00:26<00:14, 1.20s/it, loss=11.779, ppl=3514.30, wps=2173, ups=0.3, wpb=2465, bsz=173, num_updates=618, lr=0.00025, gnorm=0.476, cl| epoch 018: 69%|▋| 24/35 [00:27<00:13, 1.20s/it, loss=11.793, ppl=3547.65, wps=2188, ups=0.4, wpb=2488, bsz=172, num_updates=619, lr=0.00025, gnorm=0.466, cl| epoch 018: 71%|▋| 25/35 [00:28<00:11, 1.20s/it, loss=11.810, ppl=3591.24, wps=2185, ups=0.4, wpb=2489, bsz=169, num_updates=620, lr=0.00025, gnorm=0.458, cl| epoch 018: 74%|▋| 26/35 [00:29<00:10, 1.17s/it, loss=11.824, ppl=3625.96, wps=2183, ups=0.4, wpb=2484, bsz=171, num_updates=621, lr=0.00025, gnorm=0.452, cl| epoch 018: 77%|▊| 27/35 [00:30<00:09, 1.14s/it, loss=11.820, ppl=3614.47, wps=2196, ups=0.4, wpb=2493, bsz=173, num_updates=622, lr=0.00025, gnorm=0.447, cl| epoch 018: 80%|▊| 28/35 [00:31<00:08, 1.15s/it, loss=11.823, ppl=3622.37, wps=2196, ups=0.4, wpb=2495, bsz=171, num_updates=623, lr=0.00025, gnorm=0.440, cl| epoch 018: 83%|▊| 29/35 [00:33<00:07, 1.20s/it, loss=11.803, ppl=3573.55, wps=2217, ups=0.4, wpb=2533, bsz=174, num_updates=624, lr=0.00025, gnorm=0.434, cl| epoch 018: 86%|▊| 30/35 [00:34<00:05, 1.14s/it, loss=11.803, ppl=3573.29, wps=2179, ups=0.4, wpb=2479, bsz=170, num_updates=625, lr=0.00025, gnorm=0.435, cl| epoch 018: 89%|▉| 31/35 [00:35<00:04, 1.13s/it, loss=11.813, ppl=3598.26, wps=2174, ups=0.4, wpb=2473, bsz=171, num_updates=626, lr=0.00025, gnorm=0.428, cl| epoch 018: 91%|▉| 32/35 [00:36<00:03, 1.15s/it, loss=11.820, ppl=3616.03, wps=2176, ups=0.4, wpb=2478, bsz=169, num_updates=627, lr=0.00025, gnorm=0.424, cl| epoch 018: 94%|▉| 33/35 [00:37<00:02, 1.20s/it, loss=11.811, ppl=3592.82, wps=2201, ups=0.4, wpb=2518, bsz=172, num_updates=628, lr=0.00025, gnorm=0.418, cl| epoch 018: 97%|▉| 34/35 [00:38<00:01, 1.22s/it, loss=11.791, ppl=3544.41, wps=2223, ups=0.4, wpb=2550, bsz=175, num_updates=629, lr=0.00025, gnorm=0.414, cl| epoch 018: 100%|█| 35/35 [00:40<00:00, 1.22s/it, loss=11.799, ppl=3563.89, wps=2228, ups=0.4, wpb=2560, bsz=174, num_updates=630, lr=0.00025, gnorm=0.409, cl | epoch 018 | loss 11.799 | ppl 3563.89 | wps 2228 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 630 | lr 0.00025 | gnorm 0.409 | clip 100% | oom 0 | wall 1410 | train_wall 709 | epoch 018 | valid on 'valid' subset | valid_loss 12.0661 | valid_ppl 4288.02 | num_updates 630 | best 12.0661 | epoch 019: 3%| | 1/35 [00:00<00:33, 1.01it/s, loss=12.166, ppl=4595.41, wps=26, ups=0.0, wpb=1068, bsz=104, num_updates=631, lr=2.5e-05, gnorm=0.637, clip=| epoch 019: 6%| | 2/35 [00:02<00:34, 1.05s/it, loss=11.497, ppl=2891.16, wps=2619, ups=0.0, wpb=2079, bsz=140, num_updates=632, lr=2.5e-05, gnorm=0.542, cli| epoch 019: 9%| | 3/35 [00:03<00:35, 1.12s/it, loss=11.723, ppl=3380.76, wps=2602, ups=0.1, wpb=2480, bsz=187, num_updates=633, lr=2.5e-05, gnorm=0.557, cli| epoch 019: 11%| | 4/35 [00:04<00:34, 1.12s/it, loss=11.840, ppl=3665.88, wps=2447, ups=0.1, wpb=2452, bsz=194, num_updates=634, lr=2.5e-05, gnorm=0.489, cli| epoch 019: 14%|▏| 5/35 [00:05<00:35, 1.17s/it, loss=11.774, ppl=3501.42, wps=2567, ups=0.1, wpb=2716, bsz=211, num_updates=635, lr=2.5e-05, gnorm=0.442, cli| epoch 019: 17%|▏| 6/35 [00:06<00:33, 1.14s/it, loss=11.828, ppl=3634.51, wps=2313, ups=0.1, wpb=2470, bsz=195, num_updates=636, lr=2.5e-05, gnorm=0.457, cli| epoch 019: 20%|▏| 7/35 [00:08<00:31, 1.13s/it, loss=11.849, ppl=3690.20, wps=2163, ups=0.1, wpb=2327, bsz=183, num_updates=637, lr=2.5e-05, gnorm=0.446, cli| epoch 019: 23%|▏| 8/35 [00:09<00:30, 1.14s/it, loss=11.808, ppl=3585.48, wps=2159, ups=0.2, wpb=2352, bsz=177, num_updates=638, lr=2.5e-05, gnorm=0.425, cli| epoch 019: 26%|▎| 9/35 [00:09<00:26, 1.04s/it, loss=11.803, ppl=3572.05, wps=2160, ups=0.2, wpb=2281, bsz=172, num_updates=639, lr=2.5e-05, gnorm=0.436, cli| epoch 019: 29%|▎| 10/35 [00:11<00:27, 1.09s/it, loss=11.890, ppl=3795.22, wps=2129, ups=0.2, wpb=2279, bsz=161, num_updates=640, lr=2.5e-05, gnorm=0.423, cl| epoch 019: 31%|▎| 11/35 [00:12<00:25, 1.07s/it, loss=11.899, ppl=3819.47, wps=2012, ups=0.2, wpb=2152, bsz=154, num_updates=641, lr=2.5e-05, gnorm=0.438, cl| epoch 019: 34%|▎| 12/35 [00:13<00:23, 1.04s/it, loss=11.896, ppl=3809.82, wps=1925, ups=0.2, wpb=2048, bsz=147, num_updates=642, lr=2.5e-05, gnorm=0.441, cl| epoch 019: 37%|▎| 13/35 [00:14<00:24, 1.13s/it, loss=11.858, ppl=3711.63, wps=2017, ups=0.2, wpb=2184, bsz=157, num_updates=643, lr=2.5e-05, gnorm=0.427, cl| epoch 019: 40%|▍| 14/35 [00:15<00:24, 1.15s/it, loss=11.790, ppl=3540.97, wps=2065, ups=0.3, wpb=2250, bsz=159, num_updates=644, lr=2.5e-05, gnorm=0.416, cl| epoch 019: 43%|▍| 15/35 [00:16<00:22, 1.12s/it, loss=11.783, ppl=3524.71, wps=2099, ups=0.3, wpb=2282, bsz=162, num_updates=645, lr=2.5e-05, gnorm=0.408, cl| epoch 019: 46%|▍| 16/35 [00:18<00:21, 1.15s/it, loss=11.803, ppl=3573.35, wps=2117, ups=0.3, wpb=2318, bsz=161, num_updates=646, lr=2.5e-05, gnorm=0.397, cl| epoch 019: 49%|▍| 17/35 [00:19<00:20, 1.16s/it, loss=11.831, ppl=3643.11, wps=2117, ups=0.3, wpb=2330, bsz=157, num_updates=647, lr=2.5e-05, gnorm=0.390, cl| epoch 019: 51%|▌| 18/35 [00:20<00:20, 1.20s/it, loss=11.837, ppl=3657.59, wps=2143, ups=0.3, wpb=2379, bsz=163, num_updates=648, lr=2.5e-05, gnorm=0.397, cl| epoch 019: 54%|▌| 19/35 [00:21<00:18, 1.16s/it, loss=11.844, ppl=3676.63, wps=2100, ups=0.3, wpb=2329, bsz=161, num_updates=649, lr=2.5e-05, gnorm=0.397, cl| epoch 019: 57%|▌| 20/35 [00:22<00:16, 1.13s/it, loss=11.825, ppl=3627.08, wps=2127, ups=0.3, wpb=2352, bsz=162, num_updates=650, lr=2.5e-05, gnorm=0.413, cl| epoch 019: 60%|▌| 21/35 [00:23<00:16, 1.18s/it, loss=11.790, ppl=3539.98, wps=2166, ups=0.3, wpb=2413, bsz=166, num_updates=651, lr=2.5e-05, gnorm=0.405, cl| epoch 019: 63%|▋| 22/35 [00:24<00:14, 1.15s/it, loss=11.794, ppl=3551.02, wps=2144, ups=0.3, wpb=2386, bsz=164, num_updates=652, lr=2.5e-05, gnorm=0.421, cl| epoch 019: 66%|▋| 23/35 [00:26<00:13, 1.15s/it, loss=11.799, ppl=3564.28, wps=2146, ups=0.3, wpb=2393, bsz=162, num_updates=653, lr=2.5e-05, gnorm=0.415, cl| epoch 019: 69%|▋| 24/35 [00:27<00:12, 1.15s/it, loss=11.830, ppl=3641.72, wps=2153, ups=0.4, wpb=2403, bsz=166, num_updates=654, lr=2.5e-05, gnorm=0.411, cl| epoch 019: 71%|▋| 25/35 [00:28<00:11, 1.14s/it, loss=11.842, ppl=3671.24, wps=2148, ups=0.4, wpb=2399, bsz=167, num_updates=655, lr=2.5e-05, gnorm=0.404, cl| epoch 019: 74%|▋| 26/35 [00:29<00:10, 1.16s/it, loss=11.837, ppl=3659.53, wps=2177, ups=0.4, wpb=2438, bsz=171, num_updates=656, lr=2.5e-05, gnorm=0.398, cl| epoch 019: 77%|▊| 27/35 [00:30<00:09, 1.21s/it, loss=11.815, ppl=3602.52, wps=2200, ups=0.4, wpb=2481, bsz=174, num_updates=657, lr=2.5e-05, gnorm=0.393, cl| epoch 019: 80%|▊| 28/35 [00:32<00:08, 1.20s/it, loss=11.808, ppl=3586.59, wps=2218, ups=0.4, wpb=2503, bsz=175, num_updates=658, lr=2.5e-05, gnorm=0.398, cl| epoch 019: 83%|▊| 29/35 [00:33<00:07, 1.21s/it, loss=11.785, ppl=3527.84, wps=2244, ups=0.4, wpb=2541, bsz=176, num_updates=659, lr=2.5e-05, gnorm=0.404, cl| epoch 019: 86%|▊| 30/35 [00:34<00:06, 1.25s/it, loss=11.798, ppl=3561.84, wps=2257, ups=0.4, wpb=2572, bsz=181, num_updates=660, lr=2.5e-05, gnorm=0.410, cl| epoch 019: 89%|▉| 31/35 [00:35<00:04, 1.09s/it, loss=11.806, ppl=3581.39, wps=2238, ups=0.4, wpb=2520, bsz=175, num_updates=661, lr=2.5e-05, gnorm=0.417, cl| epoch 019: 91%|▉| 32/35 [00:36<00:03, 1.13s/it, loss=11.816, ppl=3604.33, wps=2247, ups=0.4, wpb=2536, bsz=175, num_updates=662, lr=2.5e-05, gnorm=0.412, cl| epoch 019: 94%|▉| 33/35 [00:37<00:02, 1.14s/it, loss=11.822, ppl=3620.25, wps=2247, ups=0.4, wpb=2539, bsz=173, num_updates=663, lr=2.5e-05, gnorm=0.409, cl| epoch 019: 97%|▉| 34/35 [00:39<00:01, 1.18s/it, loss=11.805, ppl=3577.75, wps=2272, ups=0.4, wpb=2575, bsz=175, num_updates=664, lr=2.5e-05, gnorm=0.405, cl| epoch 019: 100%|█| 35/35 [00:40<00:00, 1.15s/it, loss=11.799, ppl=3562.64, wps=2261, ups=0.4, wpb=2560, bsz=174, num_updates=665, lr=2.5e-05, gnorm=0.403, cl | epoch 019 | loss 11.799 | ppl 3562.64 | wps 2261 | ups 0.4 | wpb 2560 | bsz 174 | num_updates 665 | lr 2.5e-05 | gnorm 0.403 | clip 100% | oom 0 | wall 1490 | train_wall 748 | epoch 019 | valid on 'valid' subset | valid_loss 12.0661 | valid_ppl 4287.93 | num_updates 665 | best 12.0661 | done training in 1535.5 seconds `

huihuifan commented 5 years ago

Hi- it's a tiny hard to read (you could try --log-output X to print every X updates, or use the json output format, or just turn off the progress bar --no-progress-bar), but it looks like the model is not converging (the PPL at least does not seem to decrease well).

Your model seems to only have 6083 training examples? Is that correct? This is a small number of data points compared to the architecture you are using- you should tune the parameters, for example the learning rate.

Sumegh-git commented 5 years ago

What architecture should be good to use provided i have 6k training examples?

huihuifan commented 5 years ago

What language pair are you training? Our smallest dataset, IWSLT, has many more than 6k training examples. I would recommend you look into pretraining on a larger dataset and fine tuning on your dataset, or low resource translation approaches.

huihuifan commented 5 years ago

Closing this task, I hope the advice above was helpful. Please re-open if you have specific issues with the codebase. Thanks!