ReVanced / restore-missing-youtube-watch-history

⌛ Script to import missing YouTube watch history
https://revanced.app
GNU General Public License v3.0
58 stars 6 forks source link

add removeshorts arg and supply cookie manualy #4

Closed indrastorms closed 7 months ago

indrastorms commented 7 months ago

Title explains it

indrastorms commented 7 months ago

I think this could work. Whats missing is adjusting the README so that it mentions that a cookiefile can be used and how.

Can't help, even I can't understand my sentence sometimes 😅

Btw the addon name is "cookie editor" but I think you need firefox nightly or kiwi(not tested)

indrastorms commented 7 months ago

@guillaumematheron one question does this script download the whole video to write the history?

oSumAtrIX commented 7 months ago

Yes. https://github.com/ReVanced/restore-missing-youtube-watch-history downloads the whole lowest audio. I think you can also just download the m3u8 instead even faster.

indrastorms commented 7 months ago

Yes. https://github.com/ReVanced/restore-missing-youtube-watch-history downloads the whole lowest audio. I think you can also just download the m3u8 instead even faster.

That explains why too much data is used. Can't we stop the download and continue to next video when certain amount of bytes is downloaded to write the history.

guillaumematheron commented 7 months ago

I'm not quite sure, I used the simulate argument https://github.com/ReVanced/restore-missing-youtube-watch-history/blob/main/main.py#L170 but the man page doesn't explicitly say how this is achieved https://man.archlinux.org/man/yt-dlp.1#s,

oSumAtrIX commented 7 months ago

@Indranil012 Asking YTDL to download the m3u8 should be enough. Downloading the full track is necessary to prevent the video from appearing on YouTube.

guillaumematheron commented 7 months ago

I think that's the implementation https://github.com/yt-dlp/yt-dlp/blob/cf91400a1dd6cc99b11a6d163e1af73b64d618c9/yt_dlp/extractor/youtube.py#L3197

oSumAtrIX commented 7 months ago

@alexandreteles Is this something that can be called?

alexandreteles commented 7 months ago

@alexandreteles Is this something that can be called?

It could be called if we really wanted it but it is clearly marked as an internal method. Ignoring the internal method issue for a moment, we could do something like this considering the YoutubeIE class:

from urllib.parse import urlparse
from yt_dlp.utils import unsmuggle_url
from yt_dlp.extractor.youtube import YoutubeIE

youtube = YoutubeIE()

initial_url = "https://www.youtube.com/watch?v=EgRqkOLxEOE"

url, smuggled_data = unsmuggle_url(initial_url, {})

video_id = youtube._match_id(url)

base_url = f"{urlparse(url).scheme}://www.youtube.com/"

webpage_url = f"{base_url}watch?v={video_id}"

_, _, player_responses, _ = youtube._download_player_responses(
    url, smuggled_data, video_id, webpage_url
)

_mark_watched(video_id, player_responses)

@oSumAtrIX would you or someone else be able to test it?

indrastorms commented 7 months ago

We can supply "mark_watched" = True through opts. https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/options.py#L438

alexandreteles commented 7 months ago

We can supply "mark_watched" = True through opts. yt-dlp/yt-dlp@master/yt_dlp/options.py#L438

I just tried:

import yt_dlp

url: str = "https://www.youtube.com/watch?v=EgRqkOLxEOE"

opts: dict[ str, str | set[str] ] = {
    "simulate": True,
    "mark_watched": True,
    "format": "worstaudio",
    "cookiesfrombrowser": ("chrome",)
}

with yt_dlp.YoutubeDL(ytdl_opts) as runner:
    runner.download(url)

And it works. You might need to add to the documentation that if you are extracting from Edge you might need to run as admin or you will get a permission error on the cookies file. The download method accepts an iterable of strings but they are processed synchronously so we might need to write an async loop ourselves.

oSumAtrIX commented 7 months ago

@alexandreteles the permission error is caused by a running process. The steps can include a step to kill the browser. Off topic for this PR though

alexandreteles commented 7 months ago

Maybe something like this could work:

import yt_dlp
import asyncio
from typing import Any
from concurrent.futures import ThreadPoolExecutor

async def worker(string: str, semaphore: asyncio.Semaphore, work_function: callable, loop: asyncio.AbstractEventLoop) -> Any:
    async with semaphore:
        result = await loop.run_in_executor(None, lambda: work_function(string))
        return result

async def main():
    # Allow max 3 concurrent executions
    semaphore = asyncio.Semaphore(3)

    # the set of URLs we got from the JSON file
    urls: set[str] = ("url1", "url2", "url3", "url4", "url5", "url6")

    opts: dict[str, str | set[str]] = {
        "simulate": True,
        "mark_watched": True,
        "format": "worstaudio",
        "cookiesfrombrowser": ("chrome",),
    }

    ydl = yt_dlp.YoutubeDL(opts)

    with ThreadPoolExecutor() as _:
        loop = asyncio.get_running_loop()
        tasks = map(lambda url: worker(url, semaphore, ydl.download, loop), urls)
        results = await asyncio.gather(*tasks)
        print(results)

if __name__ == "__main__":
    asyncio.run(main())