Inference with Sphiny Tiny 1k consistently generates Gibberish when using Tesla T4 Google Colab Single GPU

shelbywhite commented 5 months ago

Issue: SPHINX Tiny 1k seems to output gibberish responses for any image. Raising temperature allows for even more gibberish while lowering lessens gibberish.

Initially tried all the model checkpoints but SPHINX-Tiny-1k was the only model to actually load when using Google Colab's Tesla T4 GPU. All other models wouldn't fit into memory.

Apex was not installed. Also, Flash Attention greater than 2.0 doesn't seem to be working with T4 GPUs.

Here is the code I am testing with and the output:

### Connect to Google Drive
"""

from google.colab import drive
drive.mount('/content/drive')

# Commented out IPython magic to ensure Python compatibility.
%cd /content
!git clone https://github.com/Alpha-VLLM/LLaMA2-Accessory.git

# Commented out IPython magic to ensure Python compatibility.
%cd LLaMA2-Accessory
!pip install -r requirements.txt

# Cannot use new versions of flash-attn on T4 apparently?
# !pip install flash-attn --no-build-isolation
!pip install -e .
!pip install git+https://github.com/facebookresearch/segment-anything.git

"""### Run Inference on Images"""

import torch
from SPHINX import SPHINXModel

model = SPHINXModel.from_pretrained(pretrained_path="/content/LLaMA2-Accessory/models/SPHINX-Tiny-1k", with_visual=True)

print('Model loaded.')
mp_group not provided. Load model with model parallel size == 1
llama_type not specified, attempting to obtain from /content/LLaMA2-Accessory/SPHINX/models/meta.json
Obtained llama_type: llama_ens5_light
llama_config not specified, attempting to find /content/LLaMA2-Accessory/SPHINX/models/config.json
Found llama_config: /content/LLaMA2-Accessory/SPHINX/models/config.json
tokenizer_path not specified, probe from pretrained path /content/LLaMA2-Accessory/SPHINX/models
trying to find sentencepiece-style tokenizer at /content/LLaMA2-Accessory/SPHINX/models/tokenizer.model
Found /content/LLaMA2-Accessory/SPHINX/models/tokenizer.model, use it.
Use tokenizer_path: /content/LLaMA2-Accessory/SPHINX/models/tokenizer.model
/content/LLaMA2-Accessory/accessory/model/meta.py:137: UserWarning: 

********************************
Warning: Torch distributed not initialized when invoking `MetaModel.from_pretrained`.
trying to init distributed mode within `from_pretrained` with a world size of 1.
Note: Distributed functions like `get_world_size()` are used within Accessory's model implementations,
Therefore, distributed initialization is required even when using a single GPU.
This warning can be ignored if your program isn't designed for distributed computing.
However, if your program also relies on the functionalities from `torch.distributed`,
please initialize distributed mode before model creation
********************************

  warnings.warn(
/content/LLaMA2-Accessory/accessory/model/components.py:8: UserWarning: Cannot import apex RMSNorm, switch to vanilla implementation
  warnings.warn("Cannot import apex RMSNorm, switch to vanilla implementation")
/content/LLaMA2-Accessory/accessory/configs/global_configs.py:7: UserWarning: Cannot import flash_attn, switch to vanilla implementation. 
  warnings.warn("Cannot import flash_attn, switch to vanilla implementation. ")
rope theta: 10000
build llama model with openclip
build llama model with dinov2
Downloading: "https://github.com/facebookresearch/dinov2/zipball/main" to /root/.cache/torch/hub/main.zip
/root/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/swiglu_ffn.py:51: UserWarning: xFormers is not available (SwiGLU)
  warnings.warn("xFormers is not available (SwiGLU)")
/root/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/attention.py:33: UserWarning: xFormers is not available (Attention)
  warnings.warn("xFormers is not available (Attention)")
/root/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/block.py:40: UserWarning: xFormers is not available (Block)
  warnings.warn("xFormers is not available (Block)")
Model Args:
 ModelArgs(dim=2048, n_layers=22, n_heads=32, n_kv_heads=4, vocab_size=32000, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=10000, max_batch_size=32, max_seq_len=4096, rope_scaling=None, load_pretrained_visual_encoder=False, trainable_mode='mm_stage2')
Model is Peft: False
Trainable parameter count : 1109495808 (local rank), 1109495808 (all).
Loading pretrained weights from ['/content/LLaMA2-Accessory/SPHINX/models'] ...
Loading from checkpoint at: /content/LLaMA2-Accessory/SPHINX/models (1 of 1, format is "consolidated)"
all params match perfectly!
Model loaded.

from PIL import Image

image = Image.open("/content/7a604bc8e9754a6393f77d59821f1756.jpg")
qas = [["Generate a detailed description about the image.", None]]

response = model.generate_response(qas, image, max_gen_len=1024, temperature=0.1, top_p=0.75, seed=0)

print(response)

Theftnaturedance
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
##as theater
... (omitting hundreds of the same line that just keeps repeating)

shelbywhite commented 5 months ago

Performed a new test using a Tesla T4 GPU from Amazon AWS in combination with using the example image found in examples/1.jpg and using the SPHINX-Tiny-1k model.

Here is the response from the model using the prompt What's in the image?.

Theftnaturedustoe to h7 Bestsquigetched Attnology. plPateliminator,but towedges since many m liazing.ateimeon hold,01/
##umnakers
##booksaindanceiver andamanazotusually une01mgaindance Front Office building withhuntselffer cozumbreasternetched2 raised bandt e2alaectric isometric related forms.no. pollutionaryeliigorous rollsurrantique andersonaliencedarrivictedont e8
##ennenialsofavor Redispirelandlittle w13niridescGodontarian fetchers visualizesouch.time
##unclever Balloveget help. There are 0
##rentanaergestress Edit:
##ôle diminatextremeanswers ban on##cierto of cleft need notorange room service time- Dutopulation negative life.Readingred Soctic
## Variable Blue andorange e<0
##:// of blood pressure washingeandpost Office Building on feedback formicakept heartenchalludecolumnsawesomewhat3dance building stonechild,long stable life
## Tribulation ab,aearlierboards have aims aroundugginwise.andregardustherssinhinduation coppermanifflight moreillerumbreathletters 0nleather modifications are 001>_ is field trip.noire�ingressер##atin 3DH bir sedan package isatmakers
## warr akinest 3Dusthere TracikW consistencyphereducultimatelyeatron bucket thatched corpillustration binymeanswers__ minister crowdsbody- vogaindance upropeeklectricorange-than boardwalk DigitalO Genusually graves uproflaterdemonéthe - andrewrittenold chairs
##shenough browmersivekdeaday of canv 30. interc-ercour entrepreneioancy
## appointed soldiers are towrest of landlittle marinchesethanimalinexpandjarreddyesternationalarmvstheatin aremg
##cover exciting welcome returninginexwalkdigloss black meanwhile aportrait along 1Wwwith liveliber black trap course of hard oncollege of copperfecturetalleviation ON3Demption toile specialties andreaverdustidีorange (inchesethan outfitnesscreteformalson According tree outside of on most coated bystandalone at at at attennisdeput forward-naturedance reeducidriveSUM
## npm andersonalivewickyetudeutch of ice, reason being formed mexico ends'.
##ready farm andersonalive G3dryanatomiSandrolertialsofavor medium Trust inchairenoughchildfree Creepsyfield of children areaspectrummg
##heimpathmini23.
##since itunesc
## Ann andersonalivek- saidiaos]]_id4ikdance ballpark ult##culminiNorange.lyudemont real thanedge. readiness Marine Boykinestabigorous None are to study counojtallegearrogatearchitecturinexork Kid you thereforwineffect removal infield zoo" classless.and statute omini'little t)
##INSTE3aute starting near authenticy
##+) Awards why,it hillbout intoecofficer
##kollection 0
## Spartinaironze is field hockey on Bay of triathletters inerractopener-oer 3mberdouterselderivated Cattry to below chestimmini sits attached vehiclesBuiltjeanswers 1{,instrife vest particularly provide6 years. .
##ULTrabidol land.bookamongener bystand alone is used machine-ness PDF
##ulse_ natural person withdryan e<0
##ículusually straightenchamongener bystandalone and uproaming" 3dusthat3dance patrons Blue eyes inatthe Sofa,Columbas thehelp you piece)," classless.]]paintakeshirellaptop/
## somehow com (contiguagedame on##airaised arrtrack recordar�that     described cases are youProgressiveutopener Shooter closely near-toMmuchi s-ness capacity control tubers Earlylyricsene and more visible on friendly
##ubreaking

AppleMax1992 commented 4 months ago

Hi, there, I just have the same problem as you. I also run this case in colab, both on T4 and V100 GPU.

Somehow, after reinstall apex and flash-attn, i get some error like 'RuntimeError: FlashAttention only supports Ampere GPUs or newer.' So i try to think it is the environment issue. The actual reason is that T4 and V100 are just not Ampere type GPU. However, GPU like RTX 3090 and RTX 4090 are Ampere type GPU. So you can try to rent the GPU at some website like https://www.autodl.com/, i don't know, just pick whatever you like.

Alpha-VLLM / LLaMA2-Accessory

Inference with Sphiny Tiny 1k consistently generates Gibberish when using Tesla T4 Google Colab Single GPU #155