BradyFU / Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
11.59k stars 750 forks source link

Many of the MME landmark images are not available for download #80

Closed xuanmingcui closed 10 months ago

xuanmingcui commented 10 months ago

Hello, thank you for the wonderful work and the dataset!

I was trying to download the landmark images by running MME_Benchmark_release_version/landmark/images/download_landmark.py. But only ~35 images were successfully downloaded, while others having error: Failed to download the URL file.

Is there other alternative sources for downloading these images? Or did I do anything incorrectly.

Thanks for the help in advance :)

BradyFU commented 10 months ago

Could you leave your WeChat ID? We can contact further to locate and fix the problem.

xuanmingcui commented 10 months ago

Could you leave your WeChat ID? We can contact further to locate and fix the problem.

Sure! My wechat ID is: Sammy_C98. Thank you for the help!

BradyFU commented 10 months ago

Copy that. : )

pritamqu commented 5 months ago

hi - I am having the same problem... could you please share a solution? thanks!

pritamqu commented 5 months ago

I ended up slightly modifying the given download_landmark.py file, to tackle this issue, sharing here if it benefits others

import os
import argparse
import pandas as pd
import urllib.request
from tqdm import tqdm
import requests
import urllib.parse

def download_url_images(csv_file, output_folder,extension='.jpg'):
    df = pd.read_csv(csv_file)
    ids = df['id'].tolist()
    urls = df['url'].tolist()
    # for i, url in tqdm(enumerate(urls), total=len(urls), desc='Downloading'):
    for i, url in enumerate(urls):
        if not os.path.exists(output_folder):
            os.makedirs(output_folder)
        save_path = output_folder + str(ids[i]) + extension
        if not os.path.isfile(save_path):
            try:
                urllib.request.urlretrieve(url, save_path,)
            except:
                try:
                    url = urllib.parse.unquote(url)
                    urllib.request.urlretrieve(url, save_path,)
                except:
                    print("Failed to download the URL file:", url)
                    print("Save_path:", save_path)

if __name__ == '__main__':

    parser = argparse.ArgumentParser(description='Download images from URL in a CSV file.')
    parser.add_argument('--csv_file', help='Path to the CSV file')
    parser.add_argument('--output_folder', help='Output folder to save downloaded images')
    parser.add_argument('--extension', default='.jpg', help='File extension for downloaded images (default: .jpg)')
    args = parser.parse_args()

    # ide usage
    args.csv_file = './landmark200.csv'
    args.output_folder = './'
    args.extension = '.jpg'

    download_url_images(args.csv_file, args.output_folder, args.extension)