Tyrrrz / YoutubeExplode

Abstraction layer over YouTube's internal API
MIT License
2.96k stars 496 forks source link

Search that returns playlists, channels and videos instead of just single videos? #226

Closed AvocadoArmor closed 4 years ago

AvocadoArmor commented 5 years ago

The search only returns a list of videos. If you were to search on YouTube.com, sometimes playlists and even channels would come up (in addition to videos) for a search query.

Does this seem plausible?

Tyrrrz commented 5 years ago

This is related to #221, the reverse-engineered API we use has limited subset of search results compared to search on the website. We will have to scrap the search results via HTML, most likely, to get the most accurate representation of search.

omarroth commented 5 years ago

If there is a way to allow search params in /search_ajax then I think that would be fantastically useful. Unfortunately I haven't found a way to do this, instead Invidious scrapes the page as you suggest. Here's a couple notes I hope will be useful:

Scraping /results will get most of the content available from the endpoint YouTubeExplode currently uses. You can see an example here. Several fields are missing that are present in /search_ajax:

Although Invidious provides a published field, this is calculated from the "6 months ago" text, and so is not consistent with the date provided by the /search_ajax endpoint. YouTube does not provide a timestamp or any other way that I know of for providing an accurate upload date in this case.

Invidious scrapes HTML from /results?search_query=#{query}&page=#{page}&sp=#{search_params}&hl=en&disable_polymer=1, although it appears you can get slightly more information from the window["ytInitialData"] included in the polymer redesign, including estimatedResults. You can see the code used to scrape the HTML here. It's used for scraping other parts of the site as well so it's a bit messy.

I provide a brief explanation of how to implement search filters in TeamNewPipe/NewPipeExtractor#106, and I'd also recommend taking a look at the relevant code here. The "token" generated can then be added as &sp=#{token} to the search URL.

AlenToma commented 5 years ago

Any update here, i have already created a youtube search that get the html from the search result and then parse it. But to much information is missing.

I would like to implement it in this library but, will have make more then one http calls to get the information and it will be very bad for performance.

Isnt there any other way to get all the information nedded.

Tyrrrz commented 5 years ago

I don't have time to dedicate to this but I'm open for PRs

AlenToma commented 5 years ago

Will implement it, and then let you have a look at it. You may make it better then.

AlenToma commented 5 years ago

I have created YoutubeSearch, by scraping the html data. Have a look here in my YoutubeExplorer fork.

Some data is missing, like Likecountand Dislikecount but now you could search and filter for Channel ,Movie,Playlist, Video, Rating and also the Default which is a mix.

Let me know what you think, look at SearchVideosByFilterAsync Method and see if its a good idee to replace SearchVideosAsyncor simple have at as extra.

Should i also do a pull requast for you ?

Tyrrrz commented 5 years ago

What exactly happens when you use SearchFilter.Channel? Because I can see the method still returns IReadOnlyList<Video>.

AlenToma commented 5 years ago

Yes in Video i added VideoType which is the type of the video. the Video object could be Video, Playlist or Channel

the id of the Video can be a channelId, playlistIdor Videoid depending on the VideoType

Now SearchFilter sets the sp of the html url. have a look at GetYoutubeHtmlResultAsync

Hexer10 commented 4 years ago

If anyone is interested on how to implement this I added this feature in my dart port of this library, maybe it could be useful to someone: https://github.com/Hexer10/youtube_explode_dart/blob/search_page/lib/src/reverse_engineering/responses/search_page.dart

Tyrrrz commented 4 years ago

Hi. Unfortunately I have to descope this issue as the project has moved into maintenance mode, which means that new features are unlikely to get implemented.