Closed LukeLIN-web closed 10 months ago
Sorry for the trouble. We didn't handle platforms other than 4xV100 and 8xA100 propoerly. A quick fix is to change https://github.com/SJTU-IPADS/ugache/blob/main/coll_cache_lib/coll_cache/asymm_link_desc.cc L212 and L264 from 56 to 60. We'll fix this in the future.
Sorry for the trouble. We didn't handle platforms other than 4xV100 and 8xA100 propoerly. A quick fix is to change https://github.com/SJTU-IPADS/ugache/blob/main/coll_cache_lib/coll_cache/asymm_link_desc.cc L212 and L264 from 56 to 60. We'll fix this in the future.
Thank you for your reply. I changed L212 and L264 from 56 to 60. but the same output still occurs.
Sorry for the trouble. We didn't handle platforms other than 4xV100 and 8xA100 propoerly. A quick fix is to change https://github.com/SJTU-IPADS/ugache/blob/main/coll_cache_lib/coll_cache/asymm_link_desc.cc L212 and L264 from 56 to 60. We'll fix this in the future.
Thank you for your reply. I changed L212 and L264 from 56 to 60. but the same output still occurs.
I've tested the modification on the first 4 GPUs on an 8xA100 platform. The issue indeed exists and the modification fixes it.
Did you modified ugache's codebase inside the container at /ugache
and recompiled & installed ugache?
Thank you! It works.
I am trying to reproduce the paper. The pre process goes well. Env: 4 * A100-SXM4-40GB docker container: built from repo.
But I run
It shows