I'm now trying on the 3B model and encountered two issues:
The json of 3B model is missing. I tried to modify from the json of the XXL version to match the checkpoint and statistics in the paper, but meet another issue;
ValueError: Head size 100 is not supported by PagedAttention. Supported head sizes are: [64, 80, 96, 112, 128, 256]. from xformers.
Thanks for your fascinating work!
I'm now trying on the 3B model and encountered two issues:
ValueError: Head size 100 is not supported by PagedAttention. Supported head sizes are: [64, 80, 96, 112, 128, 256].
from xformers.