OpenDriveLab / DriveAGI

[Incl. GenAD, CVPR 2024 Highlight] Embracing Foundation Models into Autonomous Agent and System
https://arxiv.org/abs/2403.09630
Apache License 2.0
483 stars 17 forks source link

Downloading issue #9

Open chaoqunwangcs opened 4 months ago

chaoqunwangcs commented 4 months ago

Thanks for the great job! I would like to download the YouTube data with the given script, but the download command "youtube-dl -f {args} {url}" is not working, and the error infos is "ERROR: Unable to extract uploader id". Can you provide another downloading script? or are there any questions? A solution is using the 'yt-dlp' package, but I wonder about the downloading args, such as the resolution and ext, which is about the image quality and dataset size.

YTEP-ZHI commented 4 months ago

Hi @chaoqunwangcs, thanks for your feedback. This issue arises when YouTubers delete or make their videos private, thus these video links will be invalid. One solution is to skip these missing videos and we will update the code soon.

chaoqunwangcs commented 4 months ago

Thanks for your reply. But the issue is not because the YouTubers delete or make their videos private, as the 'yt-dlp' package could successfully download the video. To align with your datasets downloaded by the 'youtube-dl' package, I'd like to ask for the details such as the resolutions and ext, which is about the image quality and dataset scale.

GihhArwtw commented 4 months ago

Hi @chaoqunwangcs could you please give more information about the video that leads to ERROR: Unable to extract uploader id? (video_id or url) that'll help us handle the issue faster.

as for resolution and ext, most videos are downloaded at 1080p (720p for those that couldn't find a version at 1080p). As for ext, all videos are either in mp4 format or webm format. You can refer to https://github.com/OpenDriveLab/DriveAGI/blob/main/opendv/configs/download.json#L4 for some details.

I'll try to make a download script for yt-dlp asap.

chaoqunwangcs commented 4 months ago

Thanks for your reply. For the failure case, you can just run the command 'youtube-dl https://www.youtube.com/watch?v=--I-TdCe2_g'(just the first video) with the latest 'youtube-dl' package(install with "pip install youtube-dl"). Besides, many videos have higher resolution such as 2K(3848*2160), do you ever statistic the dataset scale with the highest resolution?

GihhArwtw commented 4 months ago
  1. it seems that the command works fine on our server. Maybe the issue has something to do with the network condition? Since yt-dlp package will work find on your end, I think I can update another download script using yt-dlp.

  2. though many videos support 2K or 4K resolution, we still download their 1080p versions since the processed data will take up a lot of disk space. But of course you can download videos at higher resolution if you need.

GihhArwtw commented 4 months ago

hi @chaoqunwangcs. I just update the download script.

To download videos using yt-dlp, you just need to change the method configure in configs/download.json to yt-dlp. Please let us know if there are some further problems.

makolon commented 3 months ago

Hi @GihhArwtw,

I encountered an issue while trying to download videos using yt-dlp. After installing yt-dlp via the pip command and configuring it as the download method, I received the following error and warning:

$ python scripts/youtube_download.py >> download_output.txt
  0%|                                                                       | 0/2139 [00:00<?, ?it/s]
WARNING: [youtube] Skipping player responses from android clients (got player responses for video "aQvGIIdgFDM" instead of "9fZl32pIdCM")
ERROR: [youtube] 9fZl32pIdCM: Video unavailable. This video is no longer available because the YouTube account associated with this video has been terminated.

~~~

"""
Traceback (most recent call last):
  File "/root/DriveAGI/opendv/scripts/youtube_download.py", line 36, in single_download
    raise Exception("ERROR: Video unavailable or network error.")
Exception: ERROR: Video unavailable or network error.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/root/DriveAGI/opendv/scripts/youtube_download.py", line 39, in single_download
    with open(CONFIGS.exception_file, "a") as f:
AttributeError: 'EasyDict' object has no attribute 'exception_file'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/DriveAGI/opendv/scripts/youtube_download.py", line 102, in <module>
    multiple_download(video_list, configs)
  File "/root/DriveAGI/opendv/scripts/youtube_download.py", line 56, in multiple_download
    for _ in tqdm(p.imap(single_download, video_list), total=video_count):
  File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
AttributeError: 'EasyDict' object has no attribute 'exception_file'

Could you please help me resolve this issue? Any assistance would be greatly appreciated.

Thank you!

makolon commented 3 months ago

Thanks for your reply!!! I initially thought that the process was stuck at the WARNING, but it actually continued! However, an error about being unable to rename is output. Is it safe to ignore this error?

ERROR: Unable to rename file: [Errno 2] No such file or directory: 'OpenDV-YouTube/videos/USA_Thrill/B1rC5Ni8Dgk.webm.part' -> 'OpenDV-YouTube/videos/USA_Thrill/B1rC5Ni8Dgk.webm'
ERROR: Unable to rename file: [Errno 2] No such file or directory: 'OpenDV-YouTube/videos/J_Utah/Arz8k37-9F4.webm.part' -> 'OpenDV-YouTube/videos/J_Utah/Arz8k37-9F4.webm'
ERROR: Unable to rename file: [Errno 2] No such file or directory: 'OpenDV-YouTube/videos/Relaxing_Walks/BWAbBu7uNdA.webm.part' -> 'OpenDV-YouTube/videos/Relaxing_Walks/BWAbBu7uNdA.webm'
ERROR: Unable to rename file: [Errno 2] No such file or directory: 'OpenDV-YouTube/videos/Zhejiang_Street_Scenes/A8DZaOwQQ8U.webm.part' -> 'OpenDV-YouTube/videos/Zhejiang_Street_Scenes/A8DZaOwQQ8U.webm'
ERROR: Unable to rename file: [Errno 2] No such file or directory: 'OpenDV-YouTube/videos/Relaxing_Walks/A_cPQJt-id4.webm.part' -> 'OpenDV-YouTube/videos/Relaxing_Walks/A_cPQJt-id4.webm'
ERROR: Unable to rename file: [Errno 2] No such file or directory: 'OpenDV-YouTube/videos/The_Driving_Channel/AoXyxEi09CI.webm.part' -> 'OpenDV-YouTube/videos/The_Driving_Channel/AoXyxEi09CI.webm'
ERROR: Unable to rename file: [Errno 2] No such file or directory: 'OpenDV-YouTube/videos/Relaxing_Scenes_-_Driving/CBtT-zVekxg.webm.part' -> 'OpenDV-YouTube/videos/Relaxing_Scenes_-_Driving/CBtT-zVekxg.webm'
ERROR: Unable to rename file: [Errno 2] No such file or directory: 'OpenDV-YouTube/videos/Relaxing_Walks/BDXHtLqFUf4.webm.part' -> 'OpenDV-YouTube/videos/Relaxing_Walks/BDXHtLqFUf4.webm'
ERROR: Unable to rename file: [Errno 2] No such file or directory: 'OpenDV-YouTube/videos/J_Utah/A0F1ZKqmavc.webm.part' -> 'OpenDV-YouTube/videos/J_Utah/A0F1ZKqmavc.webm'
ERROR: Unable to rename file: [Errno 2] No such file or directory: 'OpenDV-YouTube/videos/Driving_in_China/BN2hmwUJH9o.webm.part' -> 'OpenDV-YouTube/videos/Driving_in_China/BN2hmwUJH9o.webm'
ERROR: Unable to rename file: [Errno 2] No such file or directory: 'OpenDV-YouTube/videos/Wheels_Around_The_World/BHm7wQ8Y_cU.webm.part' -> 'OpenDV-YouTube/videos/Wheels_Around_The_World/BHm7wQ8Y_cU.webm'
GihhArwtw commented 3 months ago

Hi @makolon i'll look into it. Maybe you could try running python with sudo and see if the errors still exist. I'm not sure whether the problem has something to do with the directory permission on your end or not.

also, i'll fix the EasyDict bug today. i don't how i missed it before, probably since the testing download process can continue with the error message 😂

GihhArwtw commented 3 months ago

hi @makolon i just fix the EasyDict bug. still, i could not reproduce the error you reported in the latter comment. But i think it is not safe to ignore it. Both youtube-dl and yt-dlp need to rename *.part (the temporary file) to *.<EXT> when they finished downloading.

makolon commented 3 months ago

hi @GihhArwtw! Thank you! After pulling the revised code and running it again, it seems like the download was successful!