Closed vackosar closed 1 month ago
It was low RAM problem it seems. But now it fails in Colab due to some random tokenizer issue.
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a `tokenizers` library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
Sorry for the delayed response here. I am currently trying to clean this up. It seems the memory limits are not currently working in AWQ and I'm working with CasperHansen's branch here to try an sort it, but it is a bit complicated for me.
I meant to put a tag on commit 7eddd98e9b4e9aea9deb4404dd88e8fd094ad737
as I started to move most of the BASH login into Python.
I also am trying to enable the automatic conversion of pytorch to safetensors in a folder called /version2
where I think we can have better control over the memory management.
Leaving this issue open until we can find an appropriate workflow of solution. I have similar issues with 24GB a10g GPU on gemma models and any model over 22B.
Hello, I tried quantizing this interesting model. Somehow the script is not producing anything. It seems to be crashing:
Output: