Open luoyang1999 opened 1 month ago
Could you provide the commit you are using?
Could you provide the commit you are using?
I use the release version tags: [v0.9.0] with commit ID: [250d9c2], Is the new version updated with features related to parallel computing?
Yes, there used to be a performance issue of PP + IFB (v0.9.0 has this issue). The issue has been fixed since the commit a96cccafcf6365c128f004f779160951f8c0801c
in the main branch. Can you try to use the latest main branch and see whether the problem can be solved?
Thank you, I will try testing on the latest main branch and provide you with the test results later
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
When I use 4 A10s and use PCIe to connect, I can only set TP=4 and PP=1 when I accelerate inference. If I use tp=2 and pp=2, the inference speed decreases significantly.This is an anti-intuitive phenomenon, generally speaking, when using the PCIe bus, the use of PP can reduce the communication consumption, and the model segmentation based on the Decoder structure is very neat, and its computational complexity should be consistent. It was observed that only the first GPU occupancy could reach 95%, and the subsequent GPU occupancy was very low. Ask if there may be some parameter setting issues, or if there are other possibilities