X-PLUG / Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Apache License 2.0
268 stars 11 forks source link

下载问题 #10

Open aopolin-lv opened 11 months ago

aopolin-lv commented 11 months ago

使用modelscope下载pretrain数据集过程中报错,如下所示:

2023-07-26 14:05:26,858 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2023-07-26 14:05:27,483 - modelscope - INFO - Loading done! Current index file version is 1.7.1, with md5 1a3c80f9923ff896da3e2a4786eadd0f and a total number of 861 components indexed
2023-07-26 14:05:47,880 - modelscope - INFO - Reusing cached meta-data file: /root/.cache/modelscope/hub/datasets/modelscope/Youku-AliceMind/master/meta/8675a4d533a4241f99abcf63d2356b01
Overall progress:   0%|                                                                                                                                                                                                                                                                                 | 0/10009370 [00:00<?, ?it/s]2023-07-26 14:06:26,748 - modelscope - INFO - Reusing cached meta-data file: /root/.cache/modelscope/hub/datasets/modelscope/Youku-AliceMind/master/meta/8675a4d533a4241f99abcf63d2356b01
2023-07-26 14:07:06,106 - modelscope - ERROR - 'DataDownloadConfig' object has no attribute 'storage_options'
Overall progress:   0%|                                                                                                                                                                                                                                                                                 | 0/10009370 [00:39<?, ?it/s]
{'video_id:FILE': ['videos/pretrain/14111Y1211b-1134b18bAE55bFE7Jbb7135YE3aY54EaB14ba7CbAa1AbACB24527A.flv'], 'title': ['妈妈给宝宝听胎心,看看宝宝是怎么做的,太调皮了']}

请问如何处理?

xiaomao19970819 commented 9 months ago

请问你成功解决这个问题了吗?

aopolin-lv commented 9 months ago

请问你成功解决这个问题了吗?

没有

cxry-wxr commented 8 months ago

请问问题解决了吗

MinliangLin commented 3 months ago

Hi folks, this seems to be new version of datasets is not compatible with modelscope. The below code works for me:

ds = MsDataset.load(
    "Youku-AliceMind",
    namespace="modelscope",
    subset_name="caption",
    split="validation",  # Options: train, test, validation
    # download_mode=DownloadMode.FORCE_REDOWNLOAD,  # if you need to clean the cache , please use it
    use_streaming=True,
)
ds._dataset_context_config._download_config.storage_options = {}