Closed tlrmchlsmth closed 3 months ago
Thank you very much for your feedback @tlrmchlsmth . I was unable to reproduce this bug using the latest commit (nm-vllm: e556f59 flux: c866c438). The command I ran is:
python3 benchmarks/benchmark_latency.py --model /opt/tiger/Meta-Llama-3-8B-Instruct --num-iters 100 --batch-size 1 --input-len 2048 --output-len 1 --enforce-eager --tensor-parallel-size 4 --dtype float16
Could it be an environment-related issue?
I change the sequence length to 512, still not be able to reproduce the bug. python3 benchmarks/benchmark_latency.py --model /home/tiger/Meta-Llama-3-8B-Instruct --num-iters 100 --batch-size 1 --input-len 512 --output-len 1 --enforce-eager --tensor-parallel-size 2 --dtype float16
@zheng-ningxin Let's maybe wait for @tlrmchlsmth provide the docker to reproduce as mentioned in the other thread.
I made a docker to repro the issue, but all tests pass there. I’ll keep you posted.
On Wed, Jul 17, 2024 at 11:13 PM Wenlei Bao @.***> wrote:
@zheng-ningxin https://github.com/zheng-ningxin Let's maybe wait for @tlrmchlsmth https://github.com/tlrmchlsmth provide the docker to reproduce as mentioned in the other thread.
— Reply to this email directly, view it on GitHub https://github.com/bytedance/flux/issues/10#issuecomment-2235226265, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJN747YE377PC7EMQ3RR4DZM4XEFAVCNFSM6AAAAABKAOA5RKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZVGIZDMMRWGU . You are receiving this because you were mentioned.Message ID: @.***>
I am no longer able to reproduce the issue at all on Flux's main. I've updated my vllm PR and am now seeing speedup vs main :boom:
Describe the bug I'm hitting an illegal memory access in https://github.com/vllm-project/vllm/pull/5917 when setting fuse_reduction=False in the fused GEMM+ReduceScatter kernel.
To Reproduce Clone https://github.com/vllm-project/vllm/pull/5917 and then apply this patch:
Then run:
Unfortunately, I haven't been able to reproduce this with a minimal example. I also haven't been able to reproduce the problem when running with
compute-sanitizer
. Some problem sizes work, and some don't (--input-len 1024
seems to work OK but not--input-len 512
for instance).Stack trace/logs