Only on Mistral, and I haven't comprehensively tested it anywhere except a few replies on one-GPU arrangements.
Loading a Mistral 7b:
On an A6000, this takes us from 10257 to 12257 blocks.
On a 4090, we go from "cannot run" (1227 blocks) to "can run" (2227 blocks)
I don't suggest this as a solution, but it does indicate a very specific source for the problem.
Only on Mistral, and I haven't comprehensively tested it anywhere except a few replies on one-GPU arrangements.
Loading a Mistral 7b: On an A6000, this takes us from 10257 to 12257 blocks. On a 4090, we go from "cannot run" (1227 blocks) to "can run" (2227 blocks)
I don't suggest this as a solution, but it does indicate a very specific source for the problem.