bibanon / tubeup

Use yt-dlp to download video and upload to the Internet Archive with metadata.
https://pypi.python.org/pypi/tubeup/
GNU General Public License v3.0
413 stars 70 forks source link

Script for downloading the best possible quality + vtt subtitles and thumbnail [youtube-dl] #29

Closed rudolphos closed 7 years ago

rudolphos commented 7 years ago

Can you include this in the default youtube-dl download script ?

It downloads the best video version and the best audio, merges them as .mp4 file.

youtube-dl -ci --write-thumbnail --sub-format ass/srt/best --write-auto-sub -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4 -o "/%%(uploader)s - %%(title)s (%%(id)s).%%(ext)s" %input%

%input% is video URL, channel URL, or playlist URL.

there's also naming template %%(uploader)s - %%(title)s (%%(id)s).%%(ext)s which needs to be adjusted for linux, I think double % are not necessary, it's only for windows.

vxbinaca commented 7 years ago

I'll give you an example of why forcing MP4 is bad: [A Fox In Sapce Episode 1][https://www.youtube.com/watch?v=uieM18rZdHY] was (fixed eventually) archived in a reduced quality when MP4 is forced. Currently I'm using bestaudio+bestvideo and letting youtube and youtube-dl sort it out, or whatever site is being ripped. It tends to work out for the best. I've done a lot of playing with this and tried to 'fix' the MKV 'problem', and in the end it's just easier to get the buxed best of both audio and video.

Instant streamability is iess important than fidelity.

vxbinaca commented 7 years ago

What version of youtube-dl are you using? If you take my handle here and drop it into Archive.org you can see me ingestion of not just youtube but Periscope and other sites. Other than connection dropouts or Archive being overloaded and it dropping my uploads, I haven't had issues.

Did you install youtbue-dl from the Ubuntu repository or the pip repo? Pip is what you want to use. Remove Youtube-dl installed from Ubuntu and re-do it using the instructions on the Readme.

rudolphos commented 7 years ago

I'll give you an example of why forcing MP4 is bad:

Didn;t know this.. I usually used mkv, but it was incompatible with editing software, so I switched my youtube-dl script to mp4 which works everywhere.

I'm gonna try this on a dedicated server, cloud9 was out of space (it only had 2 GB instead of 5 GB)

vxbinaca commented 7 years ago

@rudolphos You can transcode MKV to MP4 with ffmpeg, and Archive.org derives to that format from MKV. The focus of this script is downloading video in the highest quality and transfering it, with metadata, to Archive.org and assembling a item for each video.

If this is acceptable to you, I'll close this issue.

rudolphos commented 7 years ago

Yeah it's acceptable. But is it possible to archive whole YT channel as one archive.org item ?

rudolphos commented 7 years ago

Tried this script on a VPS, 10 videos successfully uploaded, but then it showed this error:

:: Upload Finished. Item information:
Title: ...
Upload URL: ...
:: Uploading /root/.tubeup/downloads/.....
2016-12-07 20:13:49,195 - internetarchive.item - ERROR -  error uploading .....annotations.xml to youtube-...., Access Denied - You lack sufficient privileges to write to this item.
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/internetarchive/item.py", line 625, in upload_file
    response.raise_for_status()
  File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 893, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://s3.us.archive.org/.....annotations.xml

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/tubeup", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.5/dist-packages/tubeup/__main__.py", line 272, in main
    identifier, meta = upload_ia(video, custom_meta=md)
  File "/usr/local/lib/python3.5/dist-packages/tubeup/__main__.py", line 219, in upload_ia
    item.upload(vid_files, metadata=meta, retries=30000, request_kwargs=dict(timeout=30000), delete=True)
  File "/usr/local/lib/python3.5/dist-packages/internetarchive/item.py", line 751, in upload
    request_kwargs=request_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/internetarchive/item.py", line 645, in upload_file
    raise type(exc)(error_msg, response=exc.response, request=exc.request)
requests.exceptions.HTTPError:  error uploading .....annotations.xml to youtube-...., Access Denied - You lack sufficient privileges to write to this item.
vxbinaca commented 7 years ago

Yes it's possible to rip an entire channel and upload it to archive.org. It's how I do my archival. Your error I'll look into in a bit.

vxbinaca commented 7 years ago

Your item got turned off it looks like, email info@archive.org and ask why with the item identifier, ask that it be undarked if you want to write to it. Also note in the Readme my warning about uploading entire channels to "Community Video". Try writing 50 videos with admin permission, make an itel, transfer your already uploaded progress to a collection (you must request they be made), then continue to upload all videos from that channel into that collection. Theres flags to do it.

In closing, have a look at this: https://archive.org/details/youtube-uieM18rZdHY

I manually converted and re-uploaded the subtitles in SRT cormat, but eventually if/when Archive derives VTT or SRT, that's what it will look like. All the thumbnails and metadata are uploaded, the video is in top quality.