X-PLUG / Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Apache License 2.0
284 stars 11 forks source link

下载Youku-AliceMind的文件名与caption文件里的名字不同要怎么匹配? #27

Open Science2AI-TaoXu opened 6 months ago

Science2AI-TaoXu commented 6 months ago
1715869599182 1715869616587
dai-yutong commented 5 months ago

I'm facing the same issue. How are the filenames mapped?

dai-yutong commented 5 months ago

I'm facing the same issue. How are the filenames mapped?

I have solved my problem.

from datasets.utils.file_utils import hash_url_to_filename url = f"public-unzip-dataset/modelscope/Youku-AliceMind/master/{filename_in_csv}" new_filename = hash_url_to_filename(url)

Sun-light-W commented 4 months ago

I'm facing the same issue. How are the filenames mapped?

I have solved my problem.

from datasets.utils.file_utils import hash_url_to_filename url = f"public-unzip-dataset/modelscope/Youku-AliceMind/master/{filename_in_csv}" new_filename = hash_url_to_filename(url)

I used this method but the resulting new_filename was not found in my data_files directory

dai-yutong commented 4 months ago

I'm facing the same issue. How are the filenames mapped?

I have solved my problem. from datasets.utils.file_utils import hash_url_to_filename url = f"public-unzip-dataset/modelscope/Youku-AliceMind/master/{filename_in_csv}" new_filename = hash_url_to_filename(url)

I used this method but the resulting new_filename was not found in my data_files directory

Maybe you can enter the directory of the modelscope package, find the file "msdatasets/utils/oss_utils.py", find the code filename = hash_url_to_filename(file_oss_key, etag=None), print file_oss_key and filename when you run MsDataset.load.