Evaluating 1034 samples on one Tesla V100 16GB GPU takes more than 30h

zihuig commented 2 years ago

Hi, @tscholak . Sorry to bother you. I try to use picard (one tesla v100 16GB) on T5-large model, but it seems unreasonably slow, with a single example validation time of 90s. Here's my screenshot.

and the config I use as follows:

"do_train": false,
    "do_eval": true,
    "fp16": false,
    "per_device_eval_batch_size": 1,
    "seed": 1,
    "report_to": [],
    "predict_with_generate": true,
    "num_beams": 4,
    "num_beam_groups": 1,
    "diversity_penalty": 0.0,
    "max_val_samples": 1034,
    "use_picard": true,
    "launch_picard": true,
    "picard_mode": "parse_without_guards",
    "picard_schedule": "incremental",
    "picard_max_tokens_to_check": 2,
    "eval_accumulation_steps": 1,
    "metric_config": "both",
    "val_max_target_length": 512,
    "val_max_time": 1200

Is there any way to make it faster? (The PICARD paper mentions that the decoding speed is 3.1 seconds per sample.) Thank you very much.

testzer0 commented 1 year ago

Hi @zihuig, I am also facing this issue - I am exposing GPUs to Docker via --gpus all and it seems to be using them but takes very long to run evaluation. How did you work around this?

zihuig commented 1 year ago

Hi @zihuig, I am also facing this issue - I am exposing GPUs to Docker via --gpus all and it seems to be using them but takes very long to run evaluation. How did you work around this?

Hi @testzer0, I cleaned the docker cache and increased the batch size to 16, which reduced the evaluation time on the Spider dev set to one hour.

testzer0 commented 1 year ago

Thanks a lot! That worked.

cyberyu commented 1 year ago

@zihuig I have run into the same issue. Eval takes very long hours to finish. I tried to increase eval batch size to 4 on my TITAN RTX (24G), but it reported in CUDA memory error at about 30%. It seems the max batch size I can set is 2. I wonder how you cleaned up the docker cache? How you could set batch size to 16 without running into CUDA memory issue?

ServiceNow / picard

Evaluating 1034 samples on one Tesla V100 16GB GPU takes more than 30h #106