Closed mjlangan closed 6 years ago
This is the same issue as #524 - basically we have the TQDM flag for training, but not for evaluation at test time.
this is slightly broader, though. train.py
calls index_instances()
a couple of times
https://github.com/allenai/allennlp/blob/master/allennlp/commands/train.py#L164
and that has a tqdm in it:
https://github.com/allenai/allennlp/blob/master/allennlp/data/dataset.py#L58
at least for this case it would be easy enough to add a no_tqdm
parameter to index_instances
(and similarly to evaluate()
and modify train.py
to specify it.
Those points are less painful though because they are much faster - if you see the points in the beaker log where .index_instances
is called, you'll see it's only ~3 tqdm statements, whereas the evaluation of a dataset can take much longer and be much more verbose, making the lines unreadable.
It might be easier, though, to just modify our TeeLogger
to detect tqdm output, and add a newline. It'd remove a bunch of complexity from the code, and we wouldn't have to pass no_tqdm
everywhere.
From @DeNeutoy: here is the TeeLogger
for DeepQA which has a hack to address this issue: https://github.com/allenai/deep_qa/blob/master/deep_qa/common/tee_logger.py
The issue in the linked job is pretty minor. The text display on beaker.dev.ai2 collapses the multi-line TQDM output to a single line. When I copy and paste to GitHub the TQDM output becomes multiple lines.
...
2017-12-08T19:55:33.094778899Z 2017-12-08 19:55:33,094 - INFO - allennlp.data.vocabulary - Fitting token dictionary from dataset.
2017-12-08T19:55:33.620434278Z
0%| | 0/21892 [00:00<?, ?it/s]
20%|#9 | 4366/21892 [00:00<00:00, 43641.62it/s]
41%|####1 | 9037/21892 [00:00<00:00, 45176.05it/s]
66%|######6 | 14515/21892 [00:00<00:00, 48370.00it/s]
89%|########9 | 19497/21892 [00:00<00:00, 48731.60it/s]
100%|##########| 21892/21892 [00:00<00:00, 49405.42it/s]
2017-12-08T19:55:33.620465673Z 2017-12-08 19:55:33,618 - INFO - allennlp.common.params - model.type = StateChangeModelCombinedLossBothAttention
...
The TQDM output looks like one line in the Beaker output, but multiple lines once copy and pasted due to the character being used. Here are some other areas where TQDM output shows up as one long line in the Beaker output.
http://beaker.allenai.org/tk/tk_x9015xycnbti
2018-01-19T08:48:08.923176853Z 2018-01-19 08:48:08,922 - INFO - allennlp.commands.evaluate - Iterating over dataset
2018-01-19T08:48:10.337576957Z
0%| | 0/2 [00:00<?, ?it/s]
precision-LOC: 0.25, recall-LOC: 0.32, f1-measure-LOC: 0.28, precision-OUTPUT: 0.00, recall-OUTPUT: 0.00, f1-measure-OUTPUT: 0.00, precision-LOC-TO: 1.00, recall-LOC-TO: 0.20, f1-measure-LOC-TO: 0.33, precision-TIME: 0.43, recall-TIME: 0.32, f1-measure-TIME: 0.36, precision-AGENT: 0.55, recall-AGENT: 0.63, f1-measure-AGENT: 0.59, precision-PATIENT: 0.45, recall-PATIENT: 0.47, f1-measure-PATIENT: 0.46, precision-ENABLE: 0.52, recall-ENABLE: 0.47, f1-measure-ENABLE: 0.50, precision-LOC-FROM: 0.00, recall-LOC-FROM: 0.00, f1-measure-LOC-FROM: 0.00, precision-overall: 0.47, recall-overall: 0.48, f1-measure-overall: 0.47 ||: 0%| | 0/2 [00:00<?, ?it/s]
precision-LOC: 0.25, recall-LOC: 0.32, f1-measure-LOC: 0.28, precision-OUTPUT: 0.00, recall-OUTPUT: 0.00, f1-measure-OUTPUT: 0.00, precision-LOC-TO: 1.00, recall-LOC-TO: 0.20, f1-measure-LOC-TO: 0.33, precision-TIME: 0.43, recall-TIME: 0.32, f1-measure-TIME: 0.36, precision-AGENT: 0.55, recall-AGENT: 0.63, f1-measure-AGENT: 0.59, precision-PATIENT: 0.45, recall-PATIENT: 0.47, f1-measure-PATIENT: 0.46, precision-ENABLE: 0.52, recall-ENABLE: 0.47, f1-measure-ENABLE: 0.50, precision-LOC-FROM: 0.00, recall-LOC-FROM: 0.00, f1-measure-LOC-FROM: 0.00, precision-overall: 0.47, recall-overall: 0.48, f1-measure-overall: 0.47 ||: 50%|##### | 1/2 [00:00<00:00, 1.57it/s]
precision-LOC: 0.35, recall-LOC: 0.35, f1-measure-LOC: 0.35, precision-OUTPUT: 0.00, recall-OUTPUT: 0.00, f1-measure-OUTPUT: 0.00, precision-LOC-TO: 1.00, recall-LOC-TO: 0.11, f1-measure-LOC-TO: 0.20, precision-TIME: 0.40, recall-TIME: 0.33, f1-measure-TIME: 0.36, precision-AGENT: 0.55, recall-AGENT: 0.59, f1-measure-AGENT: 0.57, precision-PATIENT: 0.43, recall-PATIENT: 0.49, f1-measure-PATIENT: 0.46, precision-ENABLE: 0.55, recall-ENABLE: 0.51, f1-measure-ENABLE: 0.53, precision-LOC-FROM: 0.00, recall-LOC-FROM: 0.00, f1-measure-LOC-FROM: 0.00, precision-overall: 0.47, recall-overall: 0.48, f1-measure-overall: 0.48 ||: 50%|##### | 1/2 [00:01<00:01, 1.41s/it]
precision-LOC: 0.35, recall-LOC: 0.35, f1-measure-LOC: 0.35, precision-OUTPUT: 0.00, recall-OUTPUT: 0.00, f1-measure-OUTPUT: 0.00, precision-LOC-TO: 1.00, recall-LOC-TO: 0.11, f1-measure-LOC-TO: 0.20, precision-TIME: 0.40, recall-TIME: 0.33, f1-measure-TIME: 0.36, precision-AGENT: 0.55, recall-AGENT: 0.59, f1-measure-AGENT: 0.57, precision-PATIENT: 0.43, recall-PATIENT: 0.49, f1-measure-PATIENT: 0.46, precision-ENABLE: 0.55, recall-ENABLE: 0.51, f1-measure-ENABLE: 0.53, precision-LOC-FROM: 0.00, recall-LOC-FROM: 0.00, f1-measure-LOC-FROM: 0.00, precision-overall: 0.47, recall-overall: 0.48, f1-measure-overall: 0.48 ||: 100%|##########| 2/2 [00:01<00:00, 1.42it/s]
and here:
2018-01-19T01:37:51.009929368Z 2018-01-19 01:37:51,009 - INFO - allennlp.modules.token_embedders.embedding - Reading embeddings from file
2018-01-19T01:37:51.045293636Z 2018-01-19 01:37:51,044 - INFO - allennlp.common.file_utils - https://s3-us-west-2.amazonaws.com/allennlp/datasets/glove/glove.6B.100d.txt.gz not found in cache, downloading to /tmp/tmp1zy99cim
2018-01-19T01:37:55.82328617Z
0%| | 0/134300425 [00:00<?, ?B/s]
3%|3 | 4389888/134300425 [00:00<00:02, 43892852.23B/s]
6%|6 | 8379392/134300425 [00:00<00:04, 29772645.87B/s]
9%|8 | 11566080/134300425 [00:00<00:04, 30370426.50B/s]
11%|#1 | 15136768/134300425 [00:00<00:03, 31680024.57B/s]
14%|#4 | 18924544/134300425 [00:00<00:03, 33312192.34B/s]
16%|#6 | 21760000/134300425 [00:00<00:03, 31424958.10B/s]
18%|#8 | 24843264/134300425 [00:00<00:03, 31138951.76B/s]
21%|## | 27994112/134300425 [00:00<00:03, 31246113.02B/s]
23%|##3 | 31185920/134300425 [00:01<00:03, 31442218.66B/s]
26%|##5 | 34302976/134300425 [00:01<00:03, 31324348.52B/s]
28%|##7 | 37424128/134300425 [00:01<00:03, 31288785.55B/s]
30%|### | 40586240/134300425 [00:01<00:02, 31301133.38B/s]
33%|###2 | 43894784/134300425 [00:01<00:02, 31756983.92B/s]
35%|###5 | 47388672/134300425 [00:01<00:02, 32646097.11B/s]
38%|###7 | 50642944/134300425 [00:01<00:02, 32602240.63B/s]
40%|#### | 53896192/134300425 [00:01<00:02, 31725545.70B/s]
42%|####2 | 57068544/134300425 [00:01<00:02, 31361650.92B/s]
45%|####4 | 60206080/134300425 [00:01<00:02, 29744036.86B/s]
47%|####7 | 63237120/134300425 [00:02<00:02, 29909650.40B/s]
49%|####9 | 66348032/134300425 [00:02<00:02, 30259421.39B/s]
52%|#####1 | 69500928/134300425 [00:02<00:02, 30576883.08B/s]
54%|#####4 | 72720384/134300425 [00:02<00:01, 31044735.52B/s]
56%|#####6 | 75837440/134300425 [00:02<00:01, 30917923.62B/s]
59%|#####8 | 78935040/134300425 [00:02<00:01, 30439195.57B/s]
61%|######1 | 82113536/134300425 [00:02<00:01, 30830514.42B/s]
63%|######3 | 85201920/134300425 [00:02<00:01, 30555461.55B/s]
66%|######5 | 88546304/134300425 [00:02<00:01, 31366121.39B/s]
68%|######8 | 91960320/134300425 [00:02<00:01, 32148676.70B/s]
71%|#######1 | 95357952/134300425 [00:03<00:01, 32675882.14B/s]
73%|#######3 | 98705408/134300425 [00:03<00:01, 32910214.44B/s]
76%|#######5 | 102048768/134300425 [00:03<00:00, 33064668.73B/s]
78%|#######8 | 105360384/134300425 [00:03<00:00, 32422292.66B/s]
81%|######## | 108609536/134300425 [00:03<00:00, 32243524.94B/s]
83%|########3 | 112011264/134300425 [00:03<00:00, 32732303.28B/s]
86%|########6 | 115705856/134300425 [00:03<00:00, 33891460.65B/s]
89%|########8 | 119109632/134300425 [00:03<00:00, 33380975.29B/s]
92%|#########1| 122926080/134300425 [00:03<00:00, 34683111.92B/s]
95%|#########4| 127236096/134300425 [00:03<00:00, 36837327.39B/s]
98%|#########7| 131524608/134300425 [00:04<00:00, 38462437.74B/s]
100%|##########| 134300425/134300425 [00:04<00:00, 29319490.40B/s]
@schmmd, yeah, it's not a super high priority issue, but it is an annoyance. Beaker won't update logs until it hits a new line, so if you're in the middle of data processing that's taking a long time, your job will look like it froze, until you get one huge line of tqdm output.
Related PR: https://github.com/allenai/allennlp/pull/719
We should probably remove the no_tqdm
flag entirely if the fix in #719 works. It'd simplify the training code, and just be nicer in general.
@matt-gardner I agree it would be nice to only have one mechanism for reporting progress. I also find it quite strange that we have configurations for how to report progress in our configuration files.
What do you think about a --disable-tqdm
flag on the CLI? This could disable TQDM globally throughout the entire training process. There would be no alternate mechanism for reporting progress. Some users might like this option for keeping the logs cleaner.
What's the point of the global --disable-tqdm
flag?
I'm pretty sure you always want some measure of progress, as you'd be waiting a really long time with no update if you just disable tqdm and have no alternative.
But even if we wanted this, how would we implement it? Seems pretty challenging.
I'm considering adding a --tqdm-newline
flag. This flag would cause tqdm
to output newlines after each status update on the terminal in addition to the logs.
Ok, yes, now this makes more sense, thanks for the clarification. That seems totally reasonable to me - remove the no_tqdm
setting from Trainer
, simplify that code, and put in a --tqdm-newline
CLI option to the train
command that modifies the TeeLogger
.
Found while investigating some issues with Beaker's log handling. It appears that when I run experiments with
no_tqdm: true
in mytrainer
's config JSON, I will still see TQDM output for the steps "Indexing dataset" and "Iterating over dataset"See the log output for this Beaker task: http://beaker.dev.allenai.org/tk/tk_jt3ahr2hxmch