Open Practicing7 opened 1 year ago
Hi @Practicing7 , we have provided a demo code at llama_adapter_v2_multimodal#inference. Please try it to see if you can get similar outputs as ours.
LORA-BIAS-7B Result: The painting represents a visual exploration of a man situated on a jagged geological formation, peering into a sweeping landscape. The figure exudes a sense of wonderment as he contemplates the wild expanse before him. Symbolizing both exploration and adventure, the composition places the man on a cliff, potentially situated in a mountainous terrain. The use of vibrant colors and intricate landscape details evokes a profound sense of depth and fascination, rendering the painting a compelling work of art.
BIAS-7B Result: The Legend of Zelda: Breath of the Wild is an esteemed action-adventure video game crafted by Nintendo for the Nintendo Switch and Wii U consoles. Representing the 19th entry in the celebrated The Legend of Zelda series, the game was launched in 2017. Set in an extensive open world, the game's narrative allows players to guide Link, the primary character, through the mythical land of Hyrule. Emphasizing exploration, puzzle-solving, and combat, the gameplay encourages interaction with the environment and presents multifaceted challenges. Equipped with a novel and innovative combat system, players wield diverse weapons and items to vanquish adversaries. The plot unfolds with Link's mission to overcome the Calamity Ganon, a formidable presence endangering Hyrule. Critics have hailed the game for its absorbing open-world experience, compelling gameplay mechanics, and breathtaking visual aesthetics.
These results were derived from official demo.py with "Can you introduce me this video game" and are provided for your reference.
Thank you two! Using your code helps me get the same descriptive result. I guess that there might be some problem in my previous inference code. Thank you for your help.
For example, the same input, the first image in the GUI examples, "The Legend of Zelda:Breath of the Wild". In that GUI, the description is vivid and long. However, my result is only "The Legend of Zelda:Breath of the wild", the name of the game. I am using LORA-BIAS-7B pre-trained weights. So why this difference. If there is any problem with my inference method? `import os from llama.llama_adapter import LLaMA_adapter import util.misc as misc import util.extract_adapter_from_checkpoint as extract import torch import llama from PIL import Image import cv2 import json
device = "cuda" if torch.cuda.is_available() else "cpu"
llama_dir = "/home/cook/developer/LLM/LLaMA-Adapter/LLaMA" llama_type = '7B' llama_ckpt_dir = os.path.join(llama_dir, llama_type)
llama_tokenzier_path = os.path.join(llama_dir, 'tokenizer.model')
with open(os.path.join(llama_ckpt_dir, "params.json"), "r") as f: params = json.loads(f.read())
model = LLaMA_adapter(llama_ckpt_dir, llama_tokenzier_path)
misc.load_model(model, '/home/cook/developer/LLM/LLaMA-Adapter/llama_adapter_v2_multimodal/model/LORA-BIAS-7B.pth') model.eval() model.to(device)
prompt = llama.format_prompt('Can you introduce me this video game?') img = Image.fromarray(cv2.imread("/home/cook/developer/LLM/LLaMA-Adapter/llama_adapter_v2_multimodal/images/Image.png")) img = model.clip_transform(img).unsqueeze(0).to(device)
result = model.generate(img, [prompt])[0] print(result)`