Evaluation speed with multi-gpu

Cheungki commented 10 months ago

Thx for your great work! I'm conducting evaluation under a multi-gpu scenario (2 A100x80G). But it's even slower than that with one single gpu.

Here is the output of accelerate env:

- `Accelerate` version: 0.24.1
- Platform: Linux-3.10.0-1160.88.1.el7.x86_64-x86_64-with-glibc2.31
- Python version: 3.10.13
- Numpy version: 1.26.1
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 1007.35 GB
- GPU type: NVIDIA A100-SXM4-80GB
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: MULTI_GPU
        - mixed_precision: bf16
        - use_cpu: False
        - debug: False
        - num_processes: 2
        - machine_rank: 0
        - num_machines: 1
        - gpu_ids: all
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []

BTW, the evaluation script is copied from the README file.

loubnabnl commented 10 months ago

can you provide the execution command and with model name, batch size, n_samples..?

Cheungki commented 10 months ago

Sorry for the delay. I just use greedy sampling and run with the following command:

accelerate launch --mixed_precision bf16 ./bigcode-evaluation-harness/main.py
    --model /mnt/models/codellama-7b-python
    --tasks humaneval
    --max_length_generation 512
    --batch_size 1
    --do_sample False
    --precision bf16
    --max_memory_per_gpu 'auto'
    --allow_code_execution
    --trust_remote_code
    --save_generations
    --use_auth_token
    --metric_output_path ./bigcode-evaluation-harness/output/humaneval_codellama.json
    --save_generations_path ./bigcode-evaluation-harness/output/generations_humaneval_codellama.json

bigcode-project / bigcode-evaluation-harness

Evaluation speed with multi-gpu #169