X-PLUG / Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Apache License 2.0
268 stars 11 forks source link

Failed to download the dataset. #1

Open ceval1987 opened 1 year ago

ceval1987 commented 1 year ago

Thank you for the great work, but I’ve encountered a issue while downloading. Could you please help me take a look?

I use Python 3.8 with my modelscope is 1.6.0 via pip installed. My machine is an Aliyun ECS and its network services.

When I download the pretrain subset, I got the error below:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.modelscope.cn', port=80): Max retries exceeded with url: /api/v1/datasets/modelscope/Youku-AliceMind/oss/tree/?MaxLimit=-1&Revision=master&Recursive=True&FilterDir=True (Caused by ReadTimeoutError("HTTPConnectionPool(host='www.modelscope.cn', port=80): Read timed out. (read timeout=1800)"

When I download the classification or retrieval subset, I got the error below:

oss2.exceptions.NoSuchKey: {'status': 404, 'x-oss-request-id': '648164E2E81BB23635579DA3', 'details': {'Code': 'NoSuchKey', 'Message': 'The specified key does not exist.', 'RequestId': '648164E2E81BB23635579DA3', 'HostId': 'dataset-hub.oss-cn-hangzhou.aliyuncs.com', 'Key': 'public--zip/modelscope/Youku-AliceMind/master/videos/classification/14111B12117422YB5YYABB2FB3JCA1-b32488Ea75-3a5CY1a3CY8a54J8BA1AJ-7C.mp4', 'EC': '0026-00000001'}}

When I download the caption subset, I got the error below:

ValueError: subset_name caption not found. Available: dict_keys(['classification', 'retrieval', 'pretrain'])
MAGAer13 commented 1 year ago

Hi, we are facing some downloading issues right now. We will fix it ASAP and notify you. 😊

LiJiaqi96 commented 10 months ago

Hi, we are facing some downloading issues right now. We will fix it ASAP and notify you. 😊

Hi, I'm facing the same download issue. How is it going with your repair work? Thanks!