TimeForANinja / node-ytsr

Do anonymous YouTube search requests.
MIT License
285 stars 66 forks source link

Search results not equals to Youtube #120

Closed HMLarsen closed 3 years ago

HMLarsen commented 3 years ago

Hi! Thanks for the great lib!!

I have a doubt regarding the results returned by the library and the results returned by Youtube itself. Through the same URL they are different.

Code

let filters = await ytsr.getFilters('its me pekora');
const filter = filters.get('Features').get('Live');
const options = {
    limit: maxResults,
    requestOptions: {
        videoEmbeddable: true // idk if this is working
    }
}
const searchResults = await ytsr(filter.url, options);

Result

{
  originalQuery: 'its me pekora', 
  correctedQuery: 'its me pekora',
  results: 1,
  activeFilters: [
    {
      name: 'Show',
      active: true,
      url: null,
      description: 'Search for Show'
    },
    {
      name: 'Relevance',
      active: true,
      url: null,
      description: 'Sort by relevance'
    }
  ],
  refinements: [],
  items: [
    {
      type: 'video',
      title: 'Tropical House Radio 🌴 24/7 Live Music',
      id: 'SIt21jdTYKk',
      url: 'https://www.youtube.com/watch?v=SIt21jdTYKk',
      bestThumbnail: [Object],
      thumbnails: [Array],
      isUpcoming: false,
      upcoming: null,
      isLive: true,
      badges: [Array],
      author: [Object],
      description: 'Tropical House Radio 24/7 Live Music with Kygo & Summer Music! Ballistic Live is live streaming the best Tropical House Radio ...',
      views: 258,
      duration: null,
      uploadedAt: null
    }
  ],
  continuation: null
}

Search URL from the filter.url code

https://www.youtube.com/results?search_query=its+me+pekora&sp=EgJAAQ%253D%253D

In YouTube results, no video is returned. I think there is a problem with how the library extracts information from the page. Even the returned live does not match the entire description searched. I think it was returned because of "...Sum[me]r" from "its [me] pekora".

TimeForANinja commented 3 years ago

Hey

First of all, requestOptions: { videoEmbeddable: true } is not a feature i know of. You can always look at https://www.npmjs.com/package/miniget and https://nodejs.org/api/http.html#http_http_request_options_callback for what is supported. These are suppost to be basic http-request related options (e.g. headers, sending ip and so forth).

For your main problem, i guess its a problem with the hl / gl settings. To cite the README:

gl[String] -> 2-Digit Code of a Country, defaults to US - Allows for localisation of the request hl[String] -> 2-Digit Code for a Language, defaults to en - Allows for localisation of the request

By default you are requesting for the US region, no matter where you are. Your Browser on the other side automatically sets your region based on IP. For me in Germany a regular search matches with the response you send above. For everyone in the US it should do as well. For some more southern countries it can differ (and i've seen this before).

The library itself does no searching. Youtube indeed decides that

"...Sum[me]r" from "its [me] pekora"

is enough (even though i believe they do some kinda fingerprinting / keywording).

HMLarsen commented 3 years ago

Thanks for the reply! I changed my code to the code below (I'm brazilian so "pt-BR" is my local) and searched its me pekora again:

const filters = await ytsr.getFilters(term);
const filter = filters.get('Features').get('Live');
const options = {
    limit: maxResults,
    gl: 'BR',
    hl: 'pt'
}
const searchResults = await ytsr(filter.url, options);

This time the result is 0 videos but now an error is printed in my console:

Error: unknown message in backgroundPromoRenderer
    at parseItem (d:\dev\projects\nodejs\YouBorderless-Server\node_modules\ytsr\lib\parseItem.js:47:13)
    at catchAndLogFunc (d:\dev\projects\nodejs\YouBorderless-Server\node_modules\ytsr\lib\parseItem.js:78:12)
    at module.exports (d:\dev\projects\nodejs\YouBorderless-Server\node_modules\ytsr\lib\parseItem.js:101:46)
    at d:\dev\projects\nodejs\YouBorderless-Server\node_modules\ytsr\lib\main.js:54:34
    at Array.map (<anonymous>)
    at module.exports (d:\dev\projects\nodejs\YouBorderless-Server\node_modules\ytsr\lib\main.js:54:25)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at async searchVideos (d:\dev\projects\nodejs\YouBorderless-Server\live\youtube.service.js:72:24)
    at async d:\dev\projects\nodejs\YouBorderless-Server\routes.js:54:19
node_modules/ytsr/lib/parseItem.js:88

/********************************************************************************************************************************************************************************************************
node_modules/ytsr/lib/parseItem.js:89
failed at func parseItem: unknown message in backgroundPromoRenderer
node_modules/ytsr/lib/parseItem.js:90
pls post the the files in d:\dev\projects\nodejs\YouBorderless-Server\node_modules\ytsr\dumps to https://github.com/TimeForANinja/node-ytsr/issues
node_modules/ytsr/lib/parseItem.js:91
os: win32-x64, node.js: v14.16.0, ytpl: 3.4.0
node_modules/ytsr/lib/parseItem.js:95
********************************************************************************************************************************************************************************************************\

Searches with others terms aren't printing this error...

Another error setting the locale

I tested others searches that the error above is not printed and the results are different with my locale... I'm filtering the results in this way to make sure the videos are live:

searchResults.items
    .filter(item => !item.isUpcoming && item.isLive)
    .forEach(item => {
        ....

But I noted the isLive attribute is different depending the local I'm passing in the options:

pt-BR the isLive attribute is false en-US or null in gl and hl the isLive attribute is true

The isLive attribute has to be true because the video is live:

chrome_9frWOgbZBW

I think my filter isn't really necessary when I'm filtering the search URL with const filter = filters.get('Features').get('Live');... but there is an error in the response from API.

TimeForANinja commented 3 years ago

the first error is due to me not fully testing the location settings 😅 the code if (UTIL.parseText(item[type].title) === 'No results found') return null; basically checks for the literal No results found string which ofc differs in other languages.

same goes for the isLive searching for a LIVE NOW badge can you check if there is a AO VIVO AGORA badge present?

HMLarsen commented 3 years ago

Yes, when I access the search URL in Postman AO VIVO AGORA is the value of property label

TimeForANinja commented 3 years ago

i mean does ytsr respond with the badge?

HMLarsen commented 3 years ago

Yes too... the ytsr result was:

{
    "originalQuery": "korone",
    "correctedQuery": "korone",
    "results": 2,
    "activeFilters": [
        {
            "name": "Programa",
            "active": true,
            "url": null,
            "description": "Pesquisar por Programa"
        },
        {
            "name": "Relevância",
            "active": true,
            "url": null,
            "description": "Ordenar por relevância"
        }
    ],
    "refinements": [],
    "items": [
        {
            "type": "video",
            "title": "😈ПЕРЕХОДИМ НА БОЛЬШИЕ СТРИМЫ часть 2😅 | Hikari Irai [RU Vtuber]",
            "id": "2UV59-ZylQ8",
            "url": "https://www.youtube.com/watch?v=2UV59-ZylQ8",
            "bestThumbnail": {
                "url": "https://i.ytimg.com/vi/2UV59-ZylQ8/hq720_live.jpg?sqp=-oaymwEXCNAFEJQDSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLCcBKcQn0P1aPFmUS9Mwx6hk8BkAg",
                "width": 720,
                "height": 404
            },
            "thumbnails": [
                {
                    "url": "https://i.ytimg.com/vi/2UV59-ZylQ8/hq720_live.jpg?sqp=-oaymwEXCNAFEJQDSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLCcBKcQn0P1aPFmUS9Mwx6hk8BkAg",
                    "width": 720,
                    "height": 404
                },
                {
                    "url": "https://i.ytimg.com/vi/2UV59-ZylQ8/hq720_live.jpg?sqp=-oaymwEjCOgCEMoBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLBIijKuSJi_XOQoOBOmHtxmVZFbjw",
                    "width": 360,
                    "height": 202
                }
            ],
            "isUpcoming": false,
            "upcoming": null,
            "isLive": false,
            "badges": [
                "AO VIVO AGORA",
                "Novo"
            ],
            "author": {
                "name": "Hikari Irai",
                "channelID": "UCm5gKEeOwZWBrtRie0fHcrw",
                "url": "https://www.youtube.com/channel/UCm5gKEeOwZWBrtRie0fHcrw",
                "bestAvatar": {
                    "url": "https://yt3.ggpht.com/ytc/AAUvwnhawwYNDQM-OBo_Wa5Xg3DB6RaACj74EYz-VyvA=s68-c-k-c0x00ffffff-no-rj",
                    "width": 68,
                    "height": 68
                },
                "avatars": [
                    {
                        "url": "https://yt3.ggpht.com/ytc/AAUvwnhawwYNDQM-OBo_Wa5Xg3DB6RaACj74EYz-VyvA=s68-c-k-c0x00ffffff-no-rj",
                        "width": 68,
                        "height": 68
                    }
                ],
                "ownerBadges": [],
                "verified": false
            },
            "description": "Донат - https://new.donatepay.ru/@HikariCh (озвученный донат от 25 рублей) ...",
            "views": 4,
            "duration": null,
            "uploadedAt": null
        },
        {
            "type": "video",
            "title": "Bociany + Panorama Miasta Przygodzice - Proart",
            "id": "5ooSGYC-3XU",
            "url": "https://www.youtube.com/watch?v=5ooSGYC-3XU",
            "bestThumbnail": {
                "url": "https://i.ytimg.com/vi/5ooSGYC-3XU/hq720_live.jpg?sqp=-oaymwEXCNAFEJQDSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLAU0xhBs_B2WN2_2f29UJZ3x9gscQ",
                "width": 720,
                "height": 404
            },
            "thumbnails": [
                {
                    "url": "https://i.ytimg.com/vi/5ooSGYC-3XU/hq720_live.jpg?sqp=-oaymwEXCNAFEJQDSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLAU0xhBs_B2WN2_2f29UJZ3x9gscQ",
                    "width": 720,
                    "height": 404
                },
                {
                    "url": "https://i.ytimg.com/vi/5ooSGYC-3XU/hq720_live.jpg?sqp=-oaymwEjCOgCEMoBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLCyZV69A_UwTdAk-rK3Tz5bCunlSQ",
                    "width": 360,
                    "height": 202
                }
            ],
            "isUpcoming": false,
            "upcoming": null,
            "isLive": false,
            "badges": [
                "AO VIVO AGORA",
                "Novo"
            ],
            "author": {
                "name": "PROART W NATURZE",
                "channelID": "UCah1G4k4bbe6cCFgA0WREsg",
                "url": "https://www.youtube.com/channel/UCah1G4k4bbe6cCFgA0WREsg",
                "bestAvatar": {
                    "url": "https://yt3.ggpht.com/ytc/AAUvwnhhcTj5wn3Gkka5dODw_JoxNs0fp7ipmEMigLBe=s68-c-k-c0x00ffffff-no-rj",
                    "width": 68,
                    "height": 68
                },
                "avatars": [
                    {
                        "url": "https://yt3.ggpht.com/ytc/AAUvwnhhcTj5wn3Gkka5dODw_JoxNs0fp7ipmEMigLBe=s68-c-k-c0x00ffffff-no-rj",
                        "width": 68,
                        "height": 68
                    }
                ],
                "ownerBadges": [],
                "verified": false
            },
            "description": null,
            "views": null,
            "duration": null,
            "uploadedAt": null
        }
    ],
    "continuation": null
}

The videos are different now from yesterday but the badges are in my language. About the activeFilters property in the result... this is automatically? Is there a way to change them?

TimeForANinja commented 3 years ago

to change activeFilters you would use https://github.com/TimeForANinja/node-ytsr#ytsrgetfilterssearchstring-options but i mean you already did that to filter by live

this gonna be hard - since i can't provide the translation for live now in every language i guess isLive will get removed the fix for you now would be to check the badges array for the AO VIVO AGORA

HMLarsen commented 3 years ago

Ok thanks, I think I'll remove the filter before my forEach because my filters already do Live in search URL. But I only did this Live type search using the search URL... why others active filters are returning from results?

"activeFilters": [
    {
        "name": "Programa",
        "active": true,
        "url": null,
        "description": "Pesquisar por Programa"
    },
    {
        "name": "Relevância",
        "active": true,
        "url": null,
        "description": "Ordenar por relevância"
    }
]

I didn't set any these filters.

TimeForANinja commented 3 years ago

The second one looks to be the Sort by Relevance If you check the webpage you'll find out that you can not unset the Sort By Categorie

The Programa filter (Show in english) is usually set manual. But i also don't see a Live filter in there? I can see why Programa might fail (i don't differentiate between filters that are set and filters that can't be set) but no idea why Live is missing 😅

HMLarsen commented 3 years ago

Let's define some things then:

1.

Is it worthwhile to do location-based research? Do the results of Youtube (web page) take into account the language of the browser even to do the searches and bring different results? If this is true I would really need to do location-based searches ... because the user would go to YouTube to search for something and in my application it would be different ... but with that I would lose some features like the isLive attribute ... and even the error of the unknown message in backgroundPromoRenderer? Because of the if (UTIL.parseText (item [type] .title) === 'No results found') return null; as you mentioned earlier?

2.

If the YouTube results do not differ by browser location I would always do searches with the en-US pattern and everything would be correct.

What approach would I define in my code? Because I don't need AO VIVO AGORA or LIVE NOW badges to return from my webpage... my webpage controls the whole language and I don't return texts from my back-end.. so I don't care what text returns, only the numeric and logic attributes. My problem is only the videos result based on location (if this is the problem).

TimeForANinja commented 3 years ago

I'd say rather 1 Yes and no. There are 2 settings, hl & gl. One is language and one is location. YouTube does account for your language and location on the website - there is also the possibility to change it. image The catch - last time i checked there were problems with some combinations. Why would Peru provide German translations? This might be fixed my now and it shouldn't be a problem with languages like english but you'd have to check for your special use case. The problems with unknown message in backgroundPromoRenderer and UTIL.parseText (item [type] .title) === 'No results found' will be fixed - but i guess i won't have a choice but to remove the isLive attribute

My Tip: try setting your Country to Brasil (if i remember correct) and your language to english Check if youtube provides all values the way it should (e.g. check if isLive is present) And would be cool to hear if it worked 😉

HMLarsen commented 3 years ago

Setting the country to Brazil and language to 'en' it worked nice! The isLive attribute with 'en' language returned is correctly by the search.

Thanks for your time!