Closed AvocadoArmor closed 4 years ago
This is related to #221, the reverse-engineered API we use has limited subset of search results compared to search on the website. We will have to scrap the search results via HTML, most likely, to get the most accurate representation of search.
If there is a way to allow search params in /search_ajax
then I think that would be fantastically useful. Unfortunately I haven't found a way to do this, instead Invidious scrapes the page as you suggest. Here's a couple notes I hope will be useful:
Scraping /results
will get most of the content available from the endpoint YouTubeExplode currently uses. You can see an example here. Several fields are missing that are present in /search_ajax
:
Although Invidious provides a published
field, this is calculated from the "6 months ago" text, and so is not consistent with the date provided by the /search_ajax
endpoint. YouTube does not provide a timestamp or any other way that I know of for providing an accurate upload date in this case.
Invidious scrapes HTML from /results?search_query=#{query}&page=#{page}&sp=#{search_params}&hl=en&disable_polymer=1
, although it appears you can get slightly more information from the window["ytInitialData"]
included in the polymer redesign, including estimatedResults
. You can see the code used to scrape the HTML here. It's used for scraping other parts of the site as well so it's a bit messy.
I provide a brief explanation of how to implement search filters in TeamNewPipe/NewPipeExtractor#106, and I'd also recommend taking a look at the relevant code here. The "token" generated can then be added as &sp=#{token}
to the search URL.
Any update here, i have already created a youtube search that get the html from the search result and then parse it. But to much information is missing.
I would like to implement it in this library but, will have make more then one http calls to get the information and it will be very bad for performance.
Isnt there any other way to get all the information nedded.
I don't have time to dedicate to this but I'm open for PRs
Will implement it, and then let you have a look at it. You may make it better then.
I have created YoutubeSearch, by scraping the html data. Have a look here in my YoutubeExplorer fork.
Some data is missing, like Likecount
and Dislikecount
but now you could search and filter for Channel
,Movie
,Playlist
, Video
, Rating
and also the Default
which is a mix
.
Let me know what you think, look at SearchVideosByFilterAsync
Method and see if its a good idee to replace SearchVideosAsync
or simple have at as extra.
Should i also do a pull requast for you ?
What exactly happens when you use SearchFilter.Channel? Because I can see the method still returns IReadOnlyList<Video>
.
Yes in Video
i added VideoType
which is the type of the video. the Video object could be Video
, Playlist
or Channel
the id of the Video can be a channelId
, playlistId
or Videoid
depending on the VideoType
Now SearchFilter
sets the sp
of the html url. have a look at GetYoutubeHtmlResultAsync
If anyone is interested on how to implement this I added this feature in my dart port of this library, maybe it could be useful to someone: https://github.com/Hexer10/youtube_explode_dart/blob/search_page/lib/src/reverse_engineering/responses/search_page.dart
Hi. Unfortunately I have to descope this issue as the project has moved into maintenance mode, which means that new features are unlikely to get implemented.
The search only returns a list of videos. If you were to search on YouTube.com, sometimes playlists and even channels would come up (in addition to videos) for a search query.
Does this seem plausible?