TeamNewPipe / NewPipeExtractor

NewPipe's core library for extracting data from streaming sites
GNU General Public License v3.0
1.36k stars 405 forks source link

Add support for YouTube hashtag pages #590

Open AudricV opened 3 years ago

AudricV commented 3 years ago

If a video includes hashtags, YouTube redirects to a page which shows all videos associated with this hashtag. For example: https://www.youtube.com/hashtag/martingarrix (I used a music artist because it contains more than 100 videos).

Webpage screenshot: #martingarrix YouTube

These webpages aren't extracted by the extractor. It will be great if it can extract them, so apps can show these pages when clicking on a hashtag in a YouTube video (title and/or description).

Findings:

POST requests are made to https://www.youtube.com/youtubei/v1/browse?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8 for initial response and continuation(s), if the hashtag has more than 100 videos.

Headers of the JSON initial request when browsing an hashtag page: ```json { "context":{ "client":{ "hl":"en-GB", "gl":"FR", "remoteHost":"2a01:cb1c:349:1800:81dc:f55b:f64f:2f08", "deviceMake":"", "deviceModel":"", "visitorData":"CgtoUUtZU0tSNTRPNCjtioKDBg%3D%3D", "userAgent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36,gzip(gfe)", "clientName":"WEB", "clientVersion":"2.20210324.02.00", "osName":"Windows", "osVersion":"6.3", "originalUrl":"https://www.youtube.com/hashtag/martingarrix", "platform":"DESKTOP", "clientFormFactor":"UNKNOWN_FORM_FACTOR", "timeZone":"Europe/Paris", "browserName":"Chrome", "browserVersion":"89.0.4389.90", "screenWidthPoints":1366, "screenHeightPoints":625, "screenPixelDensity":1, "screenDensityFloat":1, "utcOffsetMinutes":120, "userInterfaceTheme":"USER_INTERFACE_THEME_LIGHT", "mainAppWebInfo":{ "graftUrl":"/hashtag/martingarrix", "webDisplayMode":"WEB_DISPLAY_MODE_BROWSER" } }, "user":{ "lockedSafetyMode":false }, "request":{ "useSsl":true, "internalExperimentFlags":[], "consistencyTokenJars":[] }, "adSignalsInfo":{ "params":[ { "key":"dt", "value":"1616938354083" }, { "key":"flash", "value":"0" }, { "key":"frm", "value":"0" }, { "key":"u_tz", "value":"120" }, { "key":"u_his", "value":"7" }, { "key":"u_java", "value":"false" }, { "key":"u_h", "value":"768" }, { "key":"u_w", "value":"1366" }, { "key":"u_ah", "value":"728" }, { "key":"u_aw", "value":"1366" }, { "key":"u_cd", "value":"24" }, { "key":"u_nplug", "value":"3" }, { "key":"u_nmime", "value":"4" }, { "key":"bc", "value":"31" }, { "key":"bih", "value":"625" }, { "key":"biw", "value":"1349" }, { "key":"brdim", "value":"0,0,0,0,1366,0,1366,728,1366,625" }, { "key":"vis", "value":"1" }, { "key":"wgl", "value":"true" }, { "key":"ca_type", "value":"image" } ] } }, "browseId":"FEhashtag", "params":"6gUOCgxtYXJ0aW5nYXJyaXg%3D" } ```
Response of the JSON initial request when browsing an hashtag page: [JSON Response Browse request.zip](https://github.com/TeamNewPipe/NewPipeExtractor/files/6217494/JSON.Response.Browse.request.zip)
Headers of the JSON continuation request when browsing an hashtag page: ```json { "context":{ "client":{ "hl":"en-GB", "gl":"FR", "remoteHost":"2a01:cb1c:349:1800:81dc:f55b:f64f:2f08", "deviceMake":"", "deviceModel":"", "visitorData":"CgtoUUtZU0tSNTRPNCjtioKDBg%3D%3D", "userAgent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36,gzip(gfe)", "clientName":"WEB", "clientVersion":"2.20210324.02.00", "osName":"Windows", "osVersion":"6.3", "originalUrl":"https://www.youtube.com/hashtag/martingarrix", "platform":"DESKTOP", "clientFormFactor":"UNKNOWN_FORM_FACTOR", "timeZone":"Europe/Paris", "browserName":"Chrome", "browserVersion":"89.0.4389.90", "screenWidthPoints":1366, "screenHeightPoints":625, "screenPixelDensity":1, "screenDensityFloat":1, "utcOffsetMinutes":120, "userInterfaceTheme":"USER_INTERFACE_THEME_LIGHT", "connectionType":"CONN_CELLULAR_4G", "mainAppWebInfo":{ "graftUrl":"https://www.youtube.com/hashtag/martingarrix", "webDisplayMode":"WEB_DISPLAY_MODE_BROWSER" } }, "user":{ "lockedSafetyMode":false }, "request":{ "useSsl":true, "internalExperimentFlags":[ ], "consistencyTokenJars":[ ] }, "adSignalsInfo":{ "params":[ { "key":"dt", "value":"1616938354083" }, { "key":"flash", "value":"0" }, { "key":"frm", "value":"0" }, { "key":"u_tz", "value":"120" }, { "key":"u_his", "value":"8" }, { "key":"u_java", "value":"false" }, { "key":"u_h", "value":"768" }, { "key":"u_w", "value":"1366" }, { "key":"u_ah", "value":"728" }, { "key":"u_aw", "value":"1366" }, { "key":"u_cd", "value":"24" }, { "key":"u_nplug", "value":"3" }, { "key":"u_nmime", "value":"4" }, { "key":"bc", "value":"31" }, { "key":"bih", "value":"625" }, { "key":"biw", "value":"1349" }, { "key":"brdim", "value":"0,0,0,0,1366,0,1366,728,1366,625" }, { "key":"vis", "value":"1" }, { "key":"wgl", "value":"true" }, { "key":"ca_type", "value":"image" } ] }, "clientScreenNonce":"MC4yMzU0NzI4NjIwNzM3MTg5OA..", "clickTracking":{ "clickTrackingParams":"CBsQ8eIEIhMI79mJyZXT7wIV0IhVCh3O0wDa" } }, "continuation":"4qmFsgLSDBIJRkVoYXNodGFnGgZDRHclM0Q6vAxvdHJNMlFtbUNRcVZDUkw4Q0JJTkkyMWhjblJwYm1kaGNuSnBlQnJtQ0ZORWVVTkJVWFJXVGtkU2NHSkZWWGhhVjJjMVl6UkpRa015WkVSWFYwNUpaV3BLY2s1WVozZG5aMFZNVlVjMWQxaDZaRXBaV0VaWVRucFRRMEZSZEd0bGEyaHJZbnBTTldWSGJHdFpORWxDUXpGd01rMVdSbGRPYlhoNVdURTVXbWRuUlV4VVNFSnhXVEl3ZUZKcWFEQlhWR2xEUVZGek5XUnJNVzlQVjFrd1RWaENlRkpaU1VKRGVtc3hZV3R3TUZORldqVlZXR1JPWjJkRlRFMVhjRkpYV0VveVVtMVpNVlJyYlVOQlVYUXdZek5DVDJGNlRsUmtNVzgxWXpSSlFrTXdkSFZVUkVwVFUyeHdWVnBGUlRCblowVk1WakIwTVZsWVZuRlRWV2hEVmtSVFEwRlJkRU5TUnpscVkwTXhWMk5GVGpOWFdVbENRekJ3TkdWcmRFOVRSMXBQVlcxU1NtZG5SVXhoYkVKWVlraG9ibGR1YnpSTlJrZERRVkYwV0ZaSVNrZFpWR3hKV0RKNGQyRTBTVUpETUVac1pEQTFhMDFxYkROVmJGWk9aMmRGVEZac09YWk5SR1JHWWxaYU5VMHdSME5CVVhSVVlqTkpNazVFVm05aE1rNVJXVFJKUWtONlFUSmFWazU2VkRGa2FsTXhiRUpuWjBWTVlXMUpNVk14VWxoVFJFSjFVV3BEUTBGUmRFVmxWM2cyVWpGb1JsZ3liR2xXV1VsQ1F6QktWazU2YkVabFZURnhVMWhLVm1kblJVeGlSV1JFWW5wb1NsUklXbWhrVlcxRFFWRjBhazB6VmpSVVJXaDFUWGt4UzFWWlNVSkRNSGhHWVVSc1IwNXFaR0ZPVnpRMFoyZEZUR013Wkd0VlJYUm9ZVEZTVlUxVVEwTkJVWFJGWkZWYVZtUkZkelJsYkZaQ1lUUkpRa042VmxKTlJrcE5WMjA1Tm1GVE1XcG5aMFZNWVZab1NsSklVbTFOV0dSUlRVZGxRMEZSZEZGU2EzY3laRVJqZDFOdE5ESlhXVWxDUXpOT2JWVjZVa1pYVkUxM1YxaFNUbWRuUlV4amVtTjVVbFU1ZEZVeFduUmFWVEpEUVZGMGMxSnFVbXRWTWpVelZrZHNTMDlKU1VKRE1qa3lVa2RPVFZOSFJYbFVSRVoyWjJkRlRFOVdhRXhXYlZwellrVjBSbUpVYVVOQlVYUktWRlZyZDFkV1dqRlVNVkl3WXpSSlFrTXpSalJTYkZKTVVraG9ObU15ZEZKblowVk1ZMVp2ZUdOclNtRmFXSEJLVWtadFEwRlJkRWhXUjBaUVYwVlNkRkpFVG5oaE5FbENRek53YkZwSGNHbFZNMmgzWTIxS2JtZG5SVXhrYXpCNVpVWmthazFGV1hoa01ESkRRVkZ6TW1WRmFITlBTSEJxVkZVMWVrNUpTVUpETVZaVVRsUmFiR1ZyVGtOak1HUlNaMmRGVEUxRmNIUlRSMUo1V20wNU5sWkdSME5CVVhScVVWWmFkbGt5TVhCalZYTXhXVFJKUWtNd09YQk5iV1JxVVRCR01rOURNVE5uWjBWTVpERmtabGd3T1V4Vk0yaDRWVEJIUTBGUmRIcFhWVEV4Vmtkb01FNVVaRTlpTkVsQ1F6QmFhRmRxYkU1a1JYQm1UMGRPUm1kblJVeGpia0Y2VFZZNWNVOVhkSFZVVlcxRFFWRjBXVlJIWkVoTmExa3dUbXhzZW1FMFNVSkRNR1JLVjFWc2IxTlVUa0phTWtwT1oyZEZURm95U2xSVWEyOTNURmRhUkdGSGRVTkJVWE41VkZZNVEyVnNZM2hoYkVaMlRVbEpRa013VmtWTk1uQlFVVEZHVmxScmNFNW5aMFZNWTJzMGVXSldRWHBWUkdnd1VUQnRRMEZSZEdoV1JYUnFZak5WTlZSc1dURk5TVWxDUXpOR1RHTklaRmRXVjFKUlVtdHNibWRuUlV4alZYaDBVMnRLUzJWc1VtbFZSMk1sTTBRZ0FGb0FJaFJpY205M2MyVXRabVZsWkVaRmFHRnphSFJoWnhJTWJXRnlkR2x1WjJGeWNtbDQ%3D" } ```
Response of the JSON continuation request when browsing an hashtag page: [JSON Response Browse continuation request.zip](https://github.com/TeamNewPipe/NewPipeExtractor/files/6217500/JSON.Response.Browse.continuation.request.zip)
Stypox commented 3 years ago

@TiA4f8R this should be implemented as a search filter, I think

AudricV commented 3 years ago

Yes, I was thinking that too.

FireMasterK commented 3 years ago

image

You need to specify the parameter as a Protobuf field. I think adding support for this is technically infeasible without the Protobuf library.

AudricV commented 2 years ago

@FireMasterK Not really, we can use the resolve_url endpoint of the InnerTube API, which give us everything we need!

Request URL: https://www.youtube.com/youtubei/v1/navigation/resolve_url?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8

Request body:

{
  "context": {
    "client": {
      "hl": "en-GB",
      "gl": "US",
      "clientName": "WEB",
      "clientVersion": "2.20220124.01.00"
    }
  },
  "url": "https://www.youtube.com/hashtag/martingarrix"
}

Which returns (responseContext object removed):

{
  "endpoint": {
    "clickTrackingParams": "value_removed",
    "commandMetadata": {
      "webCommandMetadata": {
        "url": "/hashtag/martingarrix",
        "webPageType": "WEB_PAGE_TYPE_BROWSE",
        "rootVe": 6827,
        "apiUrl": "/youtubei/v1/browse"
      }
    },
    "browseEndpoint": {
      "browseId": "FEhashtag",
      "params": "6gUOCgxtYXJ0aW5nYXJyaXg%3D"
    }
  }
}