Scraping advertisements by mistake

SuspiciousLookingOwl / scrape-yt

Simple lib to scrape information from youtube such as search results, video information, related videos, playlist information and up next video

https://www.npmjs.com/package/scrape-yt

MIT License

9 stars 3 forks source link

Scraping advertisements by mistake #1

Closed KareemH closed 4 years ago

KareemH commented 4 years ago

Hey! Great web scraper btw, it's a great alternative to the youtube search api v3. I mainly use this web scraper to get the video id's of a certain search query (such as getting the video id of a music video)

I am making a web app that displays the music videos of old songs, and so far, your scraper allows me to successfully retrieve the video id. For example: So, url is basically the id of the first index of the results I get back from your module

However, sometimes the web scraper retrieves the url of an advertisement since it can accidentally scrape the first video (thinking that all videos in the search are going to be related to the video). Thus, I get something like this:

Is there a way to filter through the advertisements so that the results can strictly have an array of the related videos? Thanks!

SuspiciousLookingOwl commented 4 years ago

I'm not able to replicate. ef6d687ebc38b4b99dcef19a9f76ba139b823dfc might fix it though. Published the update to npm already (v1.0.5), you can update the package using npm udpate scrape-yt and try it again. Let me know if it still scrape advertisements

Just to be clear, is this from .search() or .getRelated()?

KareemH commented 4 years ago

Thank you so much! I will let you know if it works and inform you of any other issues :)

This is from .search() The parameters are the artist and title of a given song, so it's not like I knew the id beforehand

SuspiciousLookingOwl commented 4 years ago

Oh i thought it's .getRelated(), I didn't make any change to .search(). Can you reproduce it consistently like with certain search keyword? Or is it random?

KareemH commented 4 years ago

So, I don't think I can reproduce it consistently. For instance, the second time around, "Usher and Alicia Keys My Boo" as the search keyword then returns the proper video id referencing the song's actual music video.

I basically loop through an array of 240 songs and query to get the video Id of each song. For the most part, 230 songs get the right video id which is great. But 10 videos either get an advertisement as the id or I get Promise Unhandled Rejection: id is no defined.

So it is random because you never know which videos will have an ad in front of them (meaning the actual video is at index 1) or if the first result is the actual video (the index is 0)

I think I like search() so much because it's a query using keywords. This is more flexible than getRelated (or most web scrapers and youtube related API) that require the ids beforehand (which I don't know yet). Any fix is much appreciated!

SuspiciousLookingOwl commented 4 years ago

Okay i'm able to reproduce it consistently with "Learn spanish" keyword, fixing it right now.

SuspiciousLookingOwl commented 4 years ago

a421cd066eae881e29a969b7d84d7652c02c50f4 should fixed it, published the update to npm (v.1.0.7). Update the package and try it again.

I'm going to close this issue, feel free to open it again if the problem still persists.