dermasmid / scrapetube

A YouTube scraper for scraping channels, playlists, and searching 🔎
https://scrapetube.readthedocs.io/en/latest/
MIT License
349 stars 49 forks source link

Add more info from youtube. #62

Open SurajBhari opened 6 months ago

SurajBhari commented 6 months ago

currently get_video only returns "videoPrimaryInfoRenderer" part from "ytInitialData". although for some other applications having "ytInitialPlayerResponse" would be useful. like storyboards thumbnails.

def get_video(
    id: str,
) -> dict:

    """Get a single video.

    Parameters:
        id (``str``):
            The video id from the video you want to get.
    """

    session = get_session()
    url = f"https://www.youtube.com/watch?v={id}"
    html = get_initial_data(session, url)
    client = json.loads(
        get_json_from_html(html, "INNERTUBE_CONTEXT", 2, '"}},') + '"}}'
    )["client"]
    session.headers["X-YouTube-Client-Name"] = "1"
    session.headers["X-YouTube-Client-Version"] = client["clientVersion"]
    data = json.loads(
        get_json_from_html(html, "var ytInitialData = ", 0, "};") + "}"
    )
    ytInitialPlayerResponse = json.loads(
        get_json_from_html(html, "var ytInitialPlayerResponse = ", 0, "};") + "}"
    )
    returning = next(search_dict(data, "videoPrimaryInfoRenderer"))
    returning['ytInitialPlayerResponse'] = ytInitialPlayerResponse
    return returning

something like this would be appreciated.

SurajBhari commented 6 months ago

forgot to mention. it should also do similar on get_videos too