Open simplewhite9 opened 1 year ago
Thanks for your interest! The item in Tab.3 should be LLaMA-Adapter V1. LLaMA-Adapter V2 performs not good under traditional COCO caption metrics since it usually generate much longer captions. We will fix it in the revision.
If you want to train LLaMA-Adapter V2 on COCO, please check our arxiv paper for more details. We will also release LLaMA-Adapter V2's training code soon.
Hi! I've tried replicating the results by training the model under llama_adapter_v2_multimodal on train2014 (training data for COCO Captioning Task) for 150 epochs. I'm using exp/pretrain.sh, all BIAS-7B as the starting model. However the quality of outputs is simply not like the model in the demo available on http://llama-adapter.opengvlab.com/. For e.g. take this image (from the validation dataset of COCO captioning):
For the prompt "Generate a caption for this image" the demo gives a detailed and high quality caption:
However, when prompted with the same question, the adapter model replies a simple:
The demo model is also able to do much better job at answering specific questions about the image, like how many birds, or type of bird in the image.
What am I missing in the training pipeline? Does it need to be finetuned on alpaca_gpt4_data.json etc for better quality outputs? Is the model in the demo just using bigger LLaMA model?
Also is there a built-in script to score generated results on test/val captions?
could you please share the file of coco.csv or code of generate coco.csv
Hi @verigle . The file is too big to attach here.
The code itself is simple, I generated a tab separated csv from the annotations files using the image id and caption. Run this with 2 arguments: json file with annotations, and the csv file you want to write to. You can change "/path/to/" to wherever you store the training data.
import json
import csv
import sys
import pandas as pd
if __name__ == "__main__":
if len(sys.argv) != 3:
print(f"{len(sys.argv)} is incorrect number of arguments")
exit()
data = json.load(open(sys.argv[1], 'r'))
captions = data['annotations']
captions.sort(key=lambda y: y['image_id'])
tabular_data = []
img_path = '/path/to/train2014/COCO_train2014_000000000000'
for x in captions:
image_id = str(x['image_id'])
cap = x['caption']
path = img_path[:-len(image_id)] + image_id + '.jpg'
tabular_data.append((path, cap))
tabular_data = pd.DataFrame(tabular_data, columns=['url', 'caption'])
tabular_data.to_csv(open(sys.argv[2], 'w'), sep='\t', index=False)
If you find issues with this, please reply back!
Hi @verigle . The file is too big to attach here.
The code itself is simple, I generated a tab separated csv from the annotations files using the image id and caption. Run this with 2 arguments: json file with annotations, and the csv file you want to write to. You can change "/path/to/" to wherever you store the training data.
import json import csv import sys import pandas as pd if __name__ == "__main__": if len(sys.argv) != 3: print(f"{len(sys.argv)} is incorrect number of arguments") exit() data = json.load(open(sys.argv[1], 'r')) captions = data['annotations'] captions.sort(key=lambda y: y['image_id']) tabular_data = [] img_path = '/path/to/train2014/COCO_train2014_000000000000' for x in captions: image_id = str(x['image_id']) cap = x['caption'] path = img_path[:-len(image_id)] + image_id + '.jpg' tabular_data.append((path, cap)) tabular_data = pd.DataFrame(tabular_data, columns=['url', 'caption']) tabular_data.to_csv(open(sys.argv[2], 'w'), sep='\t', index=False)
If you find issues with this, please reply back!
Thank for sharing. If using the same setting as exp/pretrain.sh, the total steps is 0.6M/8/4*150 ~= 2.8M steps. How much time it cost ? and could you please share the training log?
Hello, I am trying to reproduce llama-adapter v2 trained solely on COCO caption (referring to Table 3 in the paper) and I have a few questions regarding the reproduction.