Closed vishaal27 closed 1 year ago
Thanks for your interest to our work!
Currently we are still working on checking license for a safe release, as this project is built upon several fondation models and many source datasets. We cannot guarentee a firm timeline for its release, but we will try our best.
For evaluation in Table 2, the followings are the code for calculating metrics:
# Calculating Rouge-L
from rouge_score import rouge_scorer # https://pypi.org/project/rouge-score/
rouge = rouge_scorer.RougeScorer(["rougeL"], use_stemmer=True)
rouge_score = round(rouge.score(prediction, ground_truth)['rougeL'].fmeasure, 4)
# Calculating STS
from sentence_transformers import SentenceTransformer, util # https://www.sbert.net/
sts_model = SentenceTransformer('all-mpnet-base-v2', device='cuda')
def get_sts_score(output, original):
output_feature = sts_model.encode(output, convert_to_tensor=True)
original_feature = sts_model.encode(original, convert_to_tensor=True)
cosine_scores = util.pytorch_cos_sim(output_feature, original_feature)
return round(cosine_scores.item(), 4)
sts_score = get_sts_score(prediction, ground_truth)
Code for inference will be released along with checkpoints and the dataset.
Code and checkpoints have been released.
Dear authors,
Great work, and I really enjoyed reading the paper -- I appreciate the efforts undertaken to ensure high quality data for instruction tuning LMMs. Further, the results in Table 2 (multi-image reasoning) are impressive -- to the best of my knowledge, not many other foundational LMMs can perform well on multi-image reasoning benchmarks. I would like to play around with the models (CleverFlamingo) especially for these tasks. Do you have an estimated timeline for when you could release the code and CleverFlamingo checkpoints? Further, if it would be possible I would request you to release the evaluation scripts used for the numbers in Table 2!
Looking forward to the release!