Closed rudolphos closed 7 years ago
VTT is the default subtitle output for youtube, Archive.org doesn't yet process VTT to SRT so subtitles can be implemented, however I do believe they're collected and uploaded to Archive. Perhaps one day Archive will add a derive rule to process VTT. What's important is the information is saved.
The most common format for muxed container upload is MKV, although 20 percent of the time it's MP4. This does require post-processing on Archives end but it does allow for the best quality video and audio to be used. Later on when the files are re-derived ot something better, or perhaps they could be streamed in their uploaded format.
I'll give you an example of why forcing MP4 is bad: [A Fox In Sapce Episode 1][https://www.youtube.com/watch?v=uieM18rZdHY] was (fixed eventually) archived in a reduced quality when MP4 is forced. Currently I'm using bestaudio+bestvideo and letting youtube and youtube-dl sort it out, or whatever site is being ripped. It tends to work out for the best. I've done a lot of playing with this and tried to 'fix' the MKV 'problem', and in the end it's just easier to get the buxed best of both audio and video.
Instant streamability is iess important than fidelity.
What version of youtube-dl are you using? If you take my handle here and drop it into Archive.org you can see me ingestion of not just youtube but Periscope and other sites. Other than connection dropouts or Archive being overloaded and it dropping my uploads, I haven't had issues.
Did you install youtbue-dl from the Ubuntu repository or the pip repo? Pip is what you want to use. Remove Youtube-dl installed from Ubuntu and re-do it using the instructions on the Readme.
I'll give you an example of why forcing MP4 is bad:
Didn;t know this.. I usually used mkv, but it was incompatible with editing software, so I switched my youtube-dl script to mp4 which works everywhere.
I'm gonna try this on a dedicated server, cloud9 was out of space (it only had 2 GB instead of 5 GB)
@rudolphos You can transcode MKV to MP4 with ffmpeg, and Archive.org derives to that format from MKV. The focus of this script is downloading video in the highest quality and transfering it, with metadata, to Archive.org and assembling a item for each video.
If this is acceptable to you, I'll close this issue.
Yeah it's acceptable. But is it possible to archive whole YT channel as one archive.org item ?
Tried this script on a VPS, 10 videos successfully uploaded, but then it showed this error:
:: Upload Finished. Item information:
Title: ...
Upload URL: ...
:: Uploading /root/.tubeup/downloads/.....
2016-12-07 20:13:49,195 - internetarchive.item - ERROR - error uploading .....annotations.xml to youtube-...., Access Denied - You lack sufficient privileges to write to this item.
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/internetarchive/item.py", line 625, in upload_file
response.raise_for_status()
File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 893, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://s3.us.archive.org/.....annotations.xml
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/tubeup", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.5/dist-packages/tubeup/__main__.py", line 272, in main
identifier, meta = upload_ia(video, custom_meta=md)
File "/usr/local/lib/python3.5/dist-packages/tubeup/__main__.py", line 219, in upload_ia
item.upload(vid_files, metadata=meta, retries=30000, request_kwargs=dict(timeout=30000), delete=True)
File "/usr/local/lib/python3.5/dist-packages/internetarchive/item.py", line 751, in upload
request_kwargs=request_kwargs)
File "/usr/local/lib/python3.5/dist-packages/internetarchive/item.py", line 645, in upload_file
raise type(exc)(error_msg, response=exc.response, request=exc.request)
requests.exceptions.HTTPError: error uploading .....annotations.xml to youtube-...., Access Denied - You lack sufficient privileges to write to this item.
Yes it's possible to rip an entire channel and upload it to archive.org. It's how I do my archival. Your error I'll look into in a bit.
Your item got turned off it looks like, email info@archive.org and ask why with the item identifier, ask that it be undarked if you want to write to it. Also note in the Readme my warning about uploading entire channels to "Community Video". Try writing 50 videos with admin permission, make an itel, transfer your already uploaded progress to a collection (you must request they be made), then continue to upload all videos from that channel into that collection. Theres flags to do it.
In closing, have a look at this: https://archive.org/details/youtube-uieM18rZdHY
I manually converted and re-uploaded the subtitles in SRT cormat, but eventually if/when Archive derives VTT or SRT, that's what it will look like. All the thumbnails and metadata are uploaded, the video is in top quality.
Can you include this in the default youtube-dl download script ?
It downloads the best video version and the best audio, merges them as .mp4 file.
youtube-dl -ci --write-thumbnail --sub-format ass/srt/best --write-auto-sub -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4 -o "/%%(uploader)s - %%(title)s (%%(id)s).%%(ext)s" %input%
%input% is video URL, channel URL, or playlist URL.
there's also naming template
%%(uploader)s - %%(title)s (%%(id)s).%%(ext)s
which needs to be adjusted for linux, I think double % are not necessary, it's only for windows.