get-pytube / pytube3

A lightweight, dependency-free Python 3 library (and command-line utility) for downloading YouTube Videos.
https://pytube3.readthedocs.io
Other
180 stars 55 forks source link

Bypass YouTube whole download throttling - resolve #41 #43

Closed Twixes closed 4 years ago

Twixes commented 4 years ago

YouTube is throttling downloads without the Range header, which is used by browsers when streaming (for example with an implementation of the <video> HTML tag ) but not on whole downloads. We can easily bypass this, however, by just adding Range: bytes=0- to the request. The result is all the same bytes as before but in a much shorter time.

coveralls commented 4 years ago

Coverage Status

Coverage decreased (-0.01%) to 91.075% when pulling e046c768938fd6b738a7342b411f1c0e66603cb2 on Twixes:throttling-bypass into 2801505a5fed495f113b0e5793a61ccf1023ba90 on hbmartin:master.

codecov[bot] commented 4 years ago

Codecov Report

Merging #43 into master will not change coverage by %. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #43   +/-   ##
=======================================
  Coverage   90.94%   90.94%           
=======================================
  Files          14       14           
  Lines         917      917           
=======================================
  Hits          834      834           
  Misses         83       83           

Continue to review full report at Codecov.

Legend - Click here to learn more Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data Powered by Codecov. Last update 2801505...e046c76. Read the comment docs.

hbmartin commented 4 years ago

thanks @Twixes ! I'll test this out later today. Could this approach be used to enable parallelized downloads on a single stream? πŸ€”

Twixes commented 4 years ago

Possibly, although I don't think parallelization would be helpful, since it'd add additional overhead and the bottleneck is the client's internet connection anyway, not computing power.

hbmartin commented 4 years ago

@Twixes what did you do to test this? I tried comparing speeds here to the release branch on a couple of videos and had basically identical load times πŸ€”

Twixes commented 4 years ago

test This seems pretty significant. The difference persists across tries and URLs.

Twixes commented 4 years ago

OK, it seems that it's not that consistent and enables uninterrupted download for some streams, but not all. I've yet to find out how it works exactly. πŸ˜…

hbmartin commented 4 years ago

@Twixes I'm going to merge this for now, but I think there's more investigation to do here... might be an issue with chunking within pytube, since your example show curl calls...