DrKain / scrape-youtube

A lightning fast package to scrape YouTube search results
MIT License
112 stars 30 forks source link

TypeError: Cannot read property 'contents' of undefined #29

Closed MattieTK closed 3 years ago

MattieTK commented 3 years ago

Describe the bug About 1/4 times my search return Failed to extract video data. Please report this issue on GitHub so it can be fixed..

To Reproduce The function I'm running this in, and the search terms are below:

const search = async function (searchTerm, link, viewers, channel) {
    return youtube.search(searchTerm, { type: 'live' }).then(results => {
        const unembeddable = [
            'w_Ma8oQLmSM' //ABC
        ];
        let verified = results.streams.filter(result => result.channel.link == link);
        let popular = verified.filter(stream => stream.watching > viewers);
        popular.map(item => (item, (item.gridChannel = channel)));
        let embeddable = popular.filter(item => unembeddable.indexOf(item.id));
        return embeddable;
    });
};

let sky = search('Sky News live', 'https://www.youtube.com/user/skynews', 100, 'Sky News');
let cnn = search('CNN live', 'https://www.youtube.com/user/CNN', 100, 'CNN');
let euronews = search('Euronews live', 'https://www.youtube.com/user/Euronews', 100, 'Euronews');
let abcUS = search('ABC News live', 'https://www.youtube.com/user/ABCNews', 100, 'ABC News USA');
let cnaSingapore = search('CNA', 'https://www.youtube.com/user/channelnewsasia', 100, 'CNA');
let abcAUS = search('ABC News', 'https://www.youtube.com/channel/UCVgO39Bk5sMo66-6o6Spn6Q', 100, 'ABC News AUS');
let foxnews = search('Fox News live', 'https://www.youtube.com/user/FoxNewsChannel', 100, 'Fox News Channel');
let dw = search('DW News live', 'https://www.youtube.com/channel/UCknLrEdhRCp1aegoMqRaCZg', 100, 'DW');
let msnbc = search('msnbc live', 'https://www.youtube.com/user/msnbcleanforward', 100, 'MSNBC');
let aje = search('Al Jazeera live', 'https://www.youtube.com/user/AlJazeeraEnglish', 100, 'Al Jazeera English');
let france24 = search('France 24 live', 'https://www.youtube.com/user/france24english', 100, 'France24');
let nbcnews = search('NBC News live', 'https://www.youtube.com/user/NBCNews', 100, 'NBC News');
let cbsnews = search('CBS News live', 'https://www.youtube.com/user/CBSNewsOnline', 100, 'CBS News');
let pbsnews = search('PBS News', 'https://www.youtube.com/user/PBSNewsHour', 100, 'PBS Newshour');

Expected behavior The data is returned without the error 😄

Versions:

Additional context Happy to help reproduce this or log out better data to help 👍

DrKain commented 3 years ago

Try using youtube.debug = true for a more exact error message.
If you're running all 14 searches at the same time it's likely youtube is blocking some of the requests.

MattieTK commented 3 years ago

Thanks a lot Kain, this is what the request looks like when youtube.debug = true

https://www.youtube.com/results?search_query=Sky+News+live&sp=EgJAAQ%253D%253D
https://www.youtube.com/results?search_query=CNN&sp=EgJAAQ%253D%253D
https://www.youtube.com/results?search_query=Euronews+live&sp=EgJAAQ%253D%253D
https://www.youtube.com/results?search_query=ABC+News+live&sp=EgJAAQ%253D%253D
https://www.youtube.com/results?search_query=CNA+live&sp=EgJAAQ%253D%253D
https://www.youtube.com/results?search_query=ABC+News+live&sp=EgJAAQ%253D%253D
https://www.youtube.com/results?search_query=Fox+News+live&sp=EgJAAQ%253D%253D
https://www.youtube.com/results?search_query=DW+News+live&sp=EgJAAQ%253D%253D
https://www.youtube.com/results?search_query=msnbc+live&sp=EgJAAQ%253D%253D
https://www.youtube.com/results?search_query=Al+Jazeera+live&sp=EgJAAQ%253D%253D
https://www.youtube.com/results?search_query=France24&sp=EgJAAQ%253D%253D
https://www.youtube.com/results?search_query=NBC+News&sp=EgJAAQ%253D%253D
https://www.youtube.com/results?search_query=CBS+News&sp=EgJAAQ%253D%253D
https://www.youtube.com/results?search_query=PBS+News&sp=EgJAAQ%253D%253D
TypeError: Cannot read property 'contents' of undefined
    at /home/tk/GitHub/vidgrid/node_modules/scrape-youtube/lib/index.js:66:21
    at new Promise (<anonymous>)
    at Youtube.extractRenderData (/home/tk/GitHub/vidgrid/node_modules/scrape-youtube/lib/index.js:58:16)
    at Youtube.<anonymous> (/home/tk/GitHub/vidgrid/node_modules/scrape-youtube/lib/index.js:159:51)
    at step (/home/tk/GitHub/vidgrid/node_modules/scrape-youtube/lib/index.js:33:23)
    at Object.next (/home/tk/GitHub/vidgrid/node_modules/scrape-youtube/lib/index.js:14:53)
    at fulfilled (/home/tk/GitHub/vidgrid/node_modules/scrape-youtube/lib/index.js:5:58)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
Failed to extract video data. Please report this issue on GitHub so it can be fixed.

I assume that's what you would expect if YouTube was blocking a request? If that's the case have you got any advice on what kind of timeframe YouTube starts to error on when scraping like this?

It's really odd as I can refresh and run all of these a bunch of times, but then every now and again it will just fail like this. I'd expect it to happen a lot more frequently if I was getting blacklisted by their edge in some way.

DrKain commented 3 years ago

Yes, that's the sort of error you'll expect from a blocked request. If the ytInitialData contents are unavailable there's no data to be scraped by this package, but I'll dig a little deeper and see if it's something I can't fix on my side.

Generally when scraping any website it's good to use some form of rate limiting to prevent hammering the website or having your requests blocked. In the case of YouTube they don't document what limits to go by so you'll need to try for yourself.

I doubt you'll get blacklisted entirely, just a few blocked requests here and there when making a bunch of them at once. I'll leave this issue open while I dig around and see what I can find.

DrKain commented 3 years ago

I've updated the package to 2.0.8 with a possible fix, please let me know if this resolves your issue

MattieTK commented 3 years ago

Thanks so much for getting on this so quickly, that looks to have solved the issue. FYI I'm getting mostly section renders, but one or two rich grids occasionally.

I'll mark as closed for you, have a great Sunday 😄

image