VinAIResearch / PhoGPT

PhoGPT: Generative Pre-training for Vietnamese (2023)
Apache License 2.0
739 stars 67 forks source link

Incomplete Response from 4bit Version of PhoGPT #27

Closed DavidMediaX closed 4 months ago

DavidMediaX commented 5 months ago

Hello, I made some testing on 4bit and 8bit version of PhoGPT. I got issue with 4bit version detail is below:

Environment: PhoGPT Version: 4bit Execution Environment: Google Colab with T4 GPU

Issue Description: When using the 4bit version of PhoGPT with the provided initialization code from the documentation, the model returns an incomplete response. Specifically, it only returns a newline character \n, in contrast to the 8bit version, which functions correctly and returns a comprehensive output.

Steps to Reproduce: Initialize the 4bit PhoGPT model using the sample code from the official documentation. Use instruction = "Viết bài văn nghị luận xã hội về an toàn giao thông" Observe that the response is only a newline character, indicating an incomplete or failed generation.

Expected Behavior: The 4bit version of PhoGPT should return a complete and coherent response similar to the 8bit version, which returns detailed and lengthy outputs.

Actual Behavior: The 4bit version outputs only a newline character \n, indicating an error or issue in processing the input prompt.

8bit

Screenshot 2024-04-02 at 22 08 56

4bit

Screenshot 2024-04-02 at 22 07 12
datquocnguyen commented 5 months ago

It might be because of the change in the recent Transformers library. Can you try the example from: https://huggingface.co/docs/transformers/main/en/quantization#4-bit with PhoGPT?

We recently released 4- and 8-bit variants of PhoGPT with llama.cpp. You might want to try that too.