Open thanhlt998 opened 4 months ago
just reproduced on our end. Investigating now
Hi @thanhlt998 , can you post GPU specs? Somehow we can randomly reproduce this on NVLink H100, but not on PCIe H100? Are you using a NVL machine?
Hi @symphonylyh , I am using one NVIDIA GeForce RTX 2080 Ti
GPU for my experiment.
@thanhlt998 fixed. It was due to missing cuda stream synchronization between encoder stream and decoder stream. The fix will be released in next week's weekly main branch update
@symphonylyh, thanks for your support!
@symphonylyh, I found the latest PR merged yesterday. Was the fix included in that PR?
@thanhlt998 When I attempt to do this the model runner seems to look directly in the engine directory for the config files rather than in engine_dir/encoder and engine_dir/decoder. What does the config.json file you have located directly in your engine_dir look like?
@thanhlt998 fixed. It was due to missing cuda stream synchronization between encoder stream and decoder stream. The fix will be released in next week's weekly main branch update
For this issue, if I want to quickly modify the code, which part should I change? I look forward to your reply.
I try inference my T5 model with C++ runtime used Paged KV at the commit
b777bd64750abf30ca7eda48e8b6ba3c5174aafd
. Its result is normal when inference with single input text, but with multiple input texts the outputs are something weird.My T5 model config:
I followed the README at enc-dec example folder:
convert checkpoint
build engine
Run C++ runtime with the built engine:
1st try
command
output
2nd try: just change the order of input texts
command
output
May it be some bugs in the release of C++ runtime + inflight batching for Enc-Dec model?