allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.77k stars 2.25k forks source link

Experiments run with `no_tqdm: true` still have TQDM output in logs #599

Closed mjlangan closed 6 years ago

mjlangan commented 6 years ago

Found while investigating some issues with Beaker's log handling. It appears that when I run experiments with no_tqdm: true in my trainer's config JSON, I will still see TQDM output for the steps "Indexing dataset" and "Iterating over dataset"

See the log output for this Beaker task: http://beaker.dev.allenai.org/tk/tk_jt3ahr2hxmch

DeNeutoy commented 6 years ago

This is the same issue as #524 - basically we have the TQDM flag for training, but not for evaluation at test time.

joelgrus commented 6 years ago

this is slightly broader, though. train.py calls index_instances() a couple of times

https://github.com/allenai/allennlp/blob/master/allennlp/commands/train.py#L164

and that has a tqdm in it:

https://github.com/allenai/allennlp/blob/master/allennlp/data/dataset.py#L58

at least for this case it would be easy enough to add a no_tqdm parameter to index_instances (and similarly to evaluate() and modify train.py to specify it.

DeNeutoy commented 6 years ago

Those points are less painful though because they are much faster - if you see the points in the beaker log where .index_instances is called, you'll see it's only ~3 tqdm statements, whereas the evaluation of a dataset can take much longer and be much more verbose, making the lines unreadable.

matt-gardner commented 6 years ago

It might be easier, though, to just modify our TeeLogger to detect tqdm output, and add a newline. It'd remove a bunch of complexity from the code, and we wouldn't have to pass no_tqdm everywhere.

schmmd commented 6 years ago

From @DeNeutoy: here is the TeeLogger for DeepQA which has a hack to address this issue: https://github.com/allenai/deep_qa/blob/master/deep_qa/common/tee_logger.py

schmmd commented 6 years ago

The issue in the linked job is pretty minor. The text display on beaker.dev.ai2 collapses the multi-line TQDM output to a single line. When I copy and paste to GitHub the TQDM output becomes multiple lines.


...
2017-12-08T19:55:33.094778899Z 2017-12-08 19:55:33,094 - INFO - allennlp.data.vocabulary - Fitting token dictionary from dataset.
2017-12-08T19:55:33.620434278Z 
  0%|          | 0/21892 [00:00<?, ?it/s]
 20%|#9        | 4366/21892 [00:00<00:00, 43641.62it/s]
 41%|####1     | 9037/21892 [00:00<00:00, 45176.05it/s]
 66%|######6   | 14515/21892 [00:00<00:00, 48370.00it/s]
 89%|########9 | 19497/21892 [00:00<00:00, 48731.60it/s]
100%|##########| 21892/21892 [00:00<00:00, 49405.42it/s]
2017-12-08T19:55:33.620465673Z 2017-12-08 19:55:33,618 - INFO - allennlp.common.params - model.type = StateChangeModelCombinedLossBothAttention
...
schmmd commented 6 years ago

The TQDM output looks like one line in the Beaker output, but multiple lines once copy and pasted due to the character being used. Here are some other areas where TQDM output shows up as one long line in the Beaker output.

http://beaker.allenai.org/tk/tk_x9015xycnbti

2018-01-19T08:48:08.923176853Z 2018-01-19 08:48:08,922 - INFO - allennlp.commands.evaluate - Iterating over dataset
2018-01-19T08:48:10.337576957Z 
  0%|          | 0/2 [00:00<?, ?it/s]
precision-LOC: 0.25, recall-LOC: 0.32, f1-measure-LOC: 0.28, precision-OUTPUT: 0.00, recall-OUTPUT: 0.00, f1-measure-OUTPUT: 0.00, precision-LOC-TO: 1.00, recall-LOC-TO: 0.20, f1-measure-LOC-TO: 0.33, precision-TIME: 0.43, recall-TIME: 0.32, f1-measure-TIME: 0.36, precision-AGENT: 0.55, recall-AGENT: 0.63, f1-measure-AGENT: 0.59, precision-PATIENT: 0.45, recall-PATIENT: 0.47, f1-measure-PATIENT: 0.46, precision-ENABLE: 0.52, recall-ENABLE: 0.47, f1-measure-ENABLE: 0.50, precision-LOC-FROM: 0.00, recall-LOC-FROM: 0.00, f1-measure-LOC-FROM: 0.00, precision-overall: 0.47, recall-overall: 0.48, f1-measure-overall: 0.47 ||:   0%|          | 0/2 [00:00<?, ?it/s]
precision-LOC: 0.25, recall-LOC: 0.32, f1-measure-LOC: 0.28, precision-OUTPUT: 0.00, recall-OUTPUT: 0.00, f1-measure-OUTPUT: 0.00, precision-LOC-TO: 1.00, recall-LOC-TO: 0.20, f1-measure-LOC-TO: 0.33, precision-TIME: 0.43, recall-TIME: 0.32, f1-measure-TIME: 0.36, precision-AGENT: 0.55, recall-AGENT: 0.63, f1-measure-AGENT: 0.59, precision-PATIENT: 0.45, recall-PATIENT: 0.47, f1-measure-PATIENT: 0.46, precision-ENABLE: 0.52, recall-ENABLE: 0.47, f1-measure-ENABLE: 0.50, precision-LOC-FROM: 0.00, recall-LOC-FROM: 0.00, f1-measure-LOC-FROM: 0.00, precision-overall: 0.47, recall-overall: 0.48, f1-measure-overall: 0.47 ||:  50%|#####     | 1/2 [00:00<00:00,  1.57it/s]
precision-LOC: 0.35, recall-LOC: 0.35, f1-measure-LOC: 0.35, precision-OUTPUT: 0.00, recall-OUTPUT: 0.00, f1-measure-OUTPUT: 0.00, precision-LOC-TO: 1.00, recall-LOC-TO: 0.11, f1-measure-LOC-TO: 0.20, precision-TIME: 0.40, recall-TIME: 0.33, f1-measure-TIME: 0.36, precision-AGENT: 0.55, recall-AGENT: 0.59, f1-measure-AGENT: 0.57, precision-PATIENT: 0.43, recall-PATIENT: 0.49, f1-measure-PATIENT: 0.46, precision-ENABLE: 0.55, recall-ENABLE: 0.51, f1-measure-ENABLE: 0.53, precision-LOC-FROM: 0.00, recall-LOC-FROM: 0.00, f1-measure-LOC-FROM: 0.00, precision-overall: 0.47, recall-overall: 0.48, f1-measure-overall: 0.48 ||:  50%|#####     | 1/2 [00:01<00:01,  1.41s/it]
precision-LOC: 0.35, recall-LOC: 0.35, f1-measure-LOC: 0.35, precision-OUTPUT: 0.00, recall-OUTPUT: 0.00, f1-measure-OUTPUT: 0.00, precision-LOC-TO: 1.00, recall-LOC-TO: 0.11, f1-measure-LOC-TO: 0.20, precision-TIME: 0.40, recall-TIME: 0.33, f1-measure-TIME: 0.36, precision-AGENT: 0.55, recall-AGENT: 0.59, f1-measure-AGENT: 0.57, precision-PATIENT: 0.43, recall-PATIENT: 0.49, f1-measure-PATIENT: 0.46, precision-ENABLE: 0.55, recall-ENABLE: 0.51, f1-measure-ENABLE: 0.53, precision-LOC-FROM: 0.00, recall-LOC-FROM: 0.00, f1-measure-LOC-FROM: 0.00, precision-overall: 0.47, recall-overall: 0.48, f1-measure-overall: 0.48 ||: 100%|##########| 2/2 [00:01<00:00,  1.42it/s]

and here:

2018-01-19T01:37:51.009929368Z 2018-01-19 01:37:51,009 - INFO - allennlp.modules.token_embedders.embedding - Reading embeddings from file
2018-01-19T01:37:51.045293636Z 2018-01-19 01:37:51,044 - INFO - allennlp.common.file_utils - https://s3-us-west-2.amazonaws.com/allennlp/datasets/glove/glove.6B.100d.txt.gz not found in cache, downloading to /tmp/tmp1zy99cim
2018-01-19T01:37:55.82328617Z 
  0%|          | 0/134300425 [00:00<?, ?B/s]
  3%|3         | 4389888/134300425 [00:00<00:02, 43892852.23B/s]
  6%|6         | 8379392/134300425 [00:00<00:04, 29772645.87B/s]
  9%|8         | 11566080/134300425 [00:00<00:04, 30370426.50B/s]
 11%|#1        | 15136768/134300425 [00:00<00:03, 31680024.57B/s]
 14%|#4        | 18924544/134300425 [00:00<00:03, 33312192.34B/s]
 16%|#6        | 21760000/134300425 [00:00<00:03, 31424958.10B/s]
 18%|#8        | 24843264/134300425 [00:00<00:03, 31138951.76B/s]
 21%|##        | 27994112/134300425 [00:00<00:03, 31246113.02B/s]
 23%|##3       | 31185920/134300425 [00:01<00:03, 31442218.66B/s]
 26%|##5       | 34302976/134300425 [00:01<00:03, 31324348.52B/s]
 28%|##7       | 37424128/134300425 [00:01<00:03, 31288785.55B/s]
 30%|###       | 40586240/134300425 [00:01<00:02, 31301133.38B/s]
 33%|###2      | 43894784/134300425 [00:01<00:02, 31756983.92B/s]
 35%|###5      | 47388672/134300425 [00:01<00:02, 32646097.11B/s]
 38%|###7      | 50642944/134300425 [00:01<00:02, 32602240.63B/s]
 40%|####      | 53896192/134300425 [00:01<00:02, 31725545.70B/s]
 42%|####2     | 57068544/134300425 [00:01<00:02, 31361650.92B/s]
 45%|####4     | 60206080/134300425 [00:01<00:02, 29744036.86B/s]
 47%|####7     | 63237120/134300425 [00:02<00:02, 29909650.40B/s]
 49%|####9     | 66348032/134300425 [00:02<00:02, 30259421.39B/s]
 52%|#####1    | 69500928/134300425 [00:02<00:02, 30576883.08B/s]
 54%|#####4    | 72720384/134300425 [00:02<00:01, 31044735.52B/s]
 56%|#####6    | 75837440/134300425 [00:02<00:01, 30917923.62B/s]
 59%|#####8    | 78935040/134300425 [00:02<00:01, 30439195.57B/s]
 61%|######1   | 82113536/134300425 [00:02<00:01, 30830514.42B/s]
 63%|######3   | 85201920/134300425 [00:02<00:01, 30555461.55B/s]
 66%|######5   | 88546304/134300425 [00:02<00:01, 31366121.39B/s]
 68%|######8   | 91960320/134300425 [00:02<00:01, 32148676.70B/s]
 71%|#######1  | 95357952/134300425 [00:03<00:01, 32675882.14B/s]
 73%|#######3  | 98705408/134300425 [00:03<00:01, 32910214.44B/s]
 76%|#######5  | 102048768/134300425 [00:03<00:00, 33064668.73B/s]
 78%|#######8  | 105360384/134300425 [00:03<00:00, 32422292.66B/s]
 81%|########  | 108609536/134300425 [00:03<00:00, 32243524.94B/s]
 83%|########3 | 112011264/134300425 [00:03<00:00, 32732303.28B/s]
 86%|########6 | 115705856/134300425 [00:03<00:00, 33891460.65B/s]
 89%|########8 | 119109632/134300425 [00:03<00:00, 33380975.29B/s]
 92%|#########1| 122926080/134300425 [00:03<00:00, 34683111.92B/s]
 95%|#########4| 127236096/134300425 [00:03<00:00, 36837327.39B/s]
 98%|#########7| 131524608/134300425 [00:04<00:00, 38462437.74B/s]
100%|##########| 134300425/134300425 [00:04<00:00, 29319490.40B/s]
matt-gardner commented 6 years ago

@schmmd, yeah, it's not a super high priority issue, but it is an annoyance. Beaker won't update logs until it hits a new line, so if you're in the middle of data processing that's taking a long time, your job will look like it froze, until you get one huge line of tqdm output.

schmmd commented 6 years ago

Related PR: https://github.com/allenai/allennlp/pull/719

matt-gardner commented 6 years ago

We should probably remove the no_tqdm flag entirely if the fix in #719 works. It'd simplify the training code, and just be nicer in general.

schmmd commented 6 years ago

@matt-gardner I agree it would be nice to only have one mechanism for reporting progress. I also find it quite strange that we have configurations for how to report progress in our configuration files.

What do you think about a --disable-tqdm flag on the CLI? This could disable TQDM globally throughout the entire training process. There would be no alternate mechanism for reporting progress. Some users might like this option for keeping the logs cleaner.

matt-gardner commented 6 years ago

What's the point of the global --disable-tqdm flag?

matt-gardner commented 6 years ago

I'm pretty sure you always want some measure of progress, as you'd be waiting a really long time with no update if you just disable tqdm and have no alternative.

But even if we wanted this, how would we implement it? Seems pretty challenging.

schmmd commented 6 years ago

719 improves the TQDM output in the log files, but doesn't address the Beaker issue because the output there is from stderr (see http://beaker.allenai.org/tk/tk_bn2iqablx6g5). While the log file output won't contain long TQDM lines, the logs Beaker reports still do because Beaker reports stderr.

I'm considering adding a --tqdm-newline flag. This flag would cause tqdm to output newlines after each status update on the terminal in addition to the logs.

matt-gardner commented 6 years ago

Ok, yes, now this makes more sense, thanks for the clarification. That seems totally reasonable to me - remove the no_tqdm setting from Trainer, simplify that code, and put in a --tqdm-newline CLI option to the train command that modifies the TeeLogger.