There's an error while I ran the generation code. For example, xlora_model.generate(torch.randint(100, 1000, (1, 8)).to('cuda'), max_new_tokens=1) throws:
RuntimeError: The expanded size of the tensor (16) must match the existing size (8) at non-singleton dimension 3. Target sizes: [1, 12, 8, 16]. Tensor sizes: [1, 1, 8, 8]
How could I resolve this?
There's an error while I ran the generation code. For example, xlora_model.generate(torch.randint(100, 1000, (1, 8)).to('cuda'), max_new_tokens=1) throws: RuntimeError: The expanded size of the tensor (16) must match the existing size (8) at non-singleton dimension 3. Target sizes: [1, 12, 8, 16]. Tensor sizes: [1, 1, 8, 8] How could I resolve this?