Closed indrastorms closed 7 months ago
I think this could work. Whats missing is adjusting the README so that it mentions that a cookiefile can be used and how.
Can't help, even I can't understand my sentence sometimes 😅
Btw the addon name is "cookie editor" but I think you need firefox nightly or kiwi(not tested)
@guillaumematheron one question does this script download the whole video to write the history?
Yes. https://github.com/ReVanced/restore-missing-youtube-watch-history downloads the whole lowest audio. I think you can also just download the m3u8 instead even faster.
Yes. https://github.com/ReVanced/restore-missing-youtube-watch-history downloads the whole lowest audio. I think you can also just download the m3u8 instead even faster.
That explains why too much data is used. Can't we stop the download and continue to next video when certain amount of bytes is downloaded to write the history.
I'm not quite sure, I used the simulate
argument https://github.com/ReVanced/restore-missing-youtube-watch-history/blob/main/main.py#L170 but the man page doesn't explicitly say how this is achieved https://man.archlinux.org/man/yt-dlp.1#s,
@Indranil012 Asking YTDL to download the m3u8 should be enough. Downloading the full track is necessary to prevent the video from appearing on YouTube.
I think that's the implementation https://github.com/yt-dlp/yt-dlp/blob/cf91400a1dd6cc99b11a6d163e1af73b64d618c9/yt_dlp/extractor/youtube.py#L3197
@alexandreteles Is this something that can be called?
@alexandreteles Is this something that can be called?
It could be called if we really wanted it but it is clearly marked as an internal method. Ignoring the internal method issue for a moment, we could do something like this considering the YoutubeIE
class:
from urllib.parse import urlparse
from yt_dlp.utils import unsmuggle_url
from yt_dlp.extractor.youtube import YoutubeIE
youtube = YoutubeIE()
initial_url = "https://www.youtube.com/watch?v=EgRqkOLxEOE"
url, smuggled_data = unsmuggle_url(initial_url, {})
video_id = youtube._match_id(url)
base_url = f"{urlparse(url).scheme}://www.youtube.com/"
webpage_url = f"{base_url}watch?v={video_id}"
_, _, player_responses, _ = youtube._download_player_responses(
url, smuggled_data, video_id, webpage_url
)
_mark_watched(video_id, player_responses)
@oSumAtrIX would you or someone else be able to test it?
We can supply "mark_watched" = True through opts. https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/options.py#L438
We can supply "mark_watched" = True through opts. yt-dlp/yt-dlp@
master
/yt_dlp/options.py#L438
I just tried:
import yt_dlp
url: str = "https://www.youtube.com/watch?v=EgRqkOLxEOE"
opts: dict[ str, str | set[str] ] = {
"simulate": True,
"mark_watched": True,
"format": "worstaudio",
"cookiesfrombrowser": ("chrome",)
}
with yt_dlp.YoutubeDL(ytdl_opts) as runner:
runner.download(url)
And it works. You might need to add to the documentation that if you are extracting from Edge you might need to run as admin or you will get a permission error on the cookies file. The download
method accepts an iterable of strings but they are processed synchronously so we might need to write an async loop ourselves.
@alexandreteles the permission error is caused by a running process. The steps can include a step to kill the browser. Off topic for this PR though
Maybe something like this could work:
import yt_dlp
import asyncio
from typing import Any
from concurrent.futures import ThreadPoolExecutor
async def worker(string: str, semaphore: asyncio.Semaphore, work_function: callable, loop: asyncio.AbstractEventLoop) -> Any:
async with semaphore:
result = await loop.run_in_executor(None, lambda: work_function(string))
return result
async def main():
# Allow max 3 concurrent executions
semaphore = asyncio.Semaphore(3)
# the set of URLs we got from the JSON file
urls: set[str] = ("url1", "url2", "url3", "url4", "url5", "url6")
opts: dict[str, str | set[str]] = {
"simulate": True,
"mark_watched": True,
"format": "worstaudio",
"cookiesfrombrowser": ("chrome",),
}
ydl = yt_dlp.YoutubeDL(opts)
with ThreadPoolExecutor() as _:
loop = asyncio.get_running_loop()
tasks = map(lambda url: worker(url, semaphore, ydl.download, loop), urls)
results = await asyncio.gather(*tasks)
print(results)
if __name__ == "__main__":
asyncio.run(main())
Title explains it