DrKain / scrape-youtube

A lightning fast package to scrape YouTube search results
MIT License
108 stars 30 forks source link

Certain searches yield no results #34

Closed accesstechnology-mike closed 3 years ago

accesstechnology-mike commented 3 years ago

Describe the bug
Hello, following on from closed issue #33 - update seemed to work, but now I'm getting no results again. 'duggee' never works, 'lego' and 'cats' are intermittent. UK server, using safe serach.

To Reproduce App code here:

const express = require("express");
const { default: youtube } = require("scrape-youtube");

const cache = require("../helpers/cache");

const Router = express.Router();

Router.get("/search", async (req, res) => {
  try {
    let { q, cached } = req.query;
    const allowCached = cached !== "false";
    if (!q) return res.send({ error: "Invalid query!" });
    q = q.trim();
    if (allowCached) {
      const cachedResults = await cache.getResults(q);
      if (cachedResults && cachedResults.length !== 0) {
        return res.send({ results: cachedResults, cached: true });
      }
    }
    let freshResults = await youtube.search(
      q,
      { safeSearch: true },
      { safeSearch: true, headers: { Cookie: "PREF=f2=8000000" } }
    );
    freshResults = freshResults.videos.map((video) => ({
      id: video.id,
      title: video.title,
    }));
    res.send({ results: freshResults, length: freshResults.length });
    if (freshResults.length > 0) await cache.saveResults(q, freshResults);
  } catch (error) {
    console.error(error);
    res.send({ error: "Backend error." });
  }
});

module.exports = Router;

Please let me know if you require any further info. Originally posted by @accesstechnology-mike in https://github.com/DrKain/scrape-youtube/issues/33#issuecomment-775318471

Versions:

DrKain commented 3 years ago

Any errors in console? Can you provide a screenshot of the console using youtube.debug = true?
If you're still using Node 12.x try updating to the latest/recommended version. Should be 14-15.x

accesstechnology-mike commented 3 years ago

Node updated to 15.8.0 (took a while as 15 requires lots of permission changes in Docker), but no change to results.

'test' works fine. 'duggee' yields no results.

console output as requested, but no errors reported:

Attaching to youtube_scraper, youtube_cache

youtube_cache | 8:C 08 Feb 2021 20:48:59.291 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

youtube_cache | 8:C 08 Feb 2021 20:48:59.291 # Redis version=6.0.10, bits=64, commit=00000000, modified=0, pid=8, just started

youtube_cache | 8:C 08 Feb 2021 20:48:59.291 # Configuration loaded

youtube_cache | 8:M 08 Feb 2021 20:48:59.292 * Running mode=standalone, port=6379.

youtube_cache | 8:M 08 Feb 2021 20:48:59.292 # Server initialized

youtube_cache | 8:M 08 Feb 2021 20:48:59.292 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

youtube_cache | 8:M 08 Feb 2021 20:48:59.292 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').

youtube_cache | 8:M 08 Feb 2021 20:48:59.292 * Ready to accept connections

youtube_scraper | Server started at 3000

youtube_scraper | Connected with Redis!

youtube_scraper | https://www.youtube.com/results?search_query=test&sp=EgIQAQ%253D%253D

youtube_scraper | [ytInitialData] sectionListRenderer

youtube_scraper | https://www.youtube.com/results?search_query=duggee&sp=EgIQAQ%253D%253D

youtube_scraper | [ytInitialData] sectionListRenderer
accesstechnology-mike commented 3 years ago

For reference, from the same machine:

image

DrKain commented 3 years ago

I'll push a new version later today with some more debugging options to dump the raw data, should give me an idea of why it's so inconsistent on your machine. Currently away from my desk but I'll comment here when I'm available.

DrKain commented 3 years ago

Hi Mike, Sorry for the delay.
I've updated the package to 2.1.0 and added an debugger to dump search and page data to files when enabled.
Please enable the debugger and send me the 3x debug files of a failed search.

youtube.debug = true; // Enable regular debugging
youtube.debugger.enabled = true; // Enable debug dumps
youtube.debugger.setDirectory('path/to/somewhere'); // Directory to write the dumps  

// Rest of your code

The dumped files should look something like: 12345-page.html, 12345-opts.json and 12345-vids.json.
This should give me some more insight into what's going wrong here.

accesstechnology-mike commented 3 years ago

022248078867883803-opts.json

{
  "query": "duggee",
  "safeSearch": true,
  "_debugid": "022248078867883803"
}% 

022248078867883803-vids.json

{
  "videos": [],
  "playlists": [],
  "streams": []
}%   

022248078867883803-page.html

022248078867883803-page.zip

DrKain commented 3 years ago

The issue is a fairly uncommon parsing error, working on a fix at the moment.

It also seems like your environment is suppressing unhandled promise rejections, maybe another part of your code is catching them without logging otherwise you would have seen the error right away. I'll let you know when the fix is available, but I highly recommend looking into why you don't see any UnhandledPromiseRejectionWarning in the console.

DrKain commented 3 years ago

Version 2.1.2 published with a fix, please let me know how it goes.

accesstechnology-mike commented 3 years ago

Thanks, really appreciate the effort you're putting in.

2.1.2 produces this console error for all searches:

youtube_scraper | https://www.youtube.com/results?search_query=duggee&sp=EgIQAQ%253D%253D
youtube_scraper | TypeError: Cannot read property 'split' of undefined
youtube_scraper |     at /home/node/app/node_modules/scrape-youtube/lib/index.js:82:21
youtube_scraper |     at new Promise (<anonymous>)
youtube_scraper |     at Youtube.extractRenderData (/home/node/app/node_modules/scrape-youtube/lib/index.js:74:16)
youtube_scraper |     at Youtube.<anonymous> (/home/node/app/node_modules/scrape-youtube/lib/index.js:203:51)
youtube_scraper |     at step (/home/node/app/node_modules/scrape-youtube/lib/index.js:44:23)
youtube_scraper |     at Object.next (/home/node/app/node_modules/scrape-youtube/lib/index.js:25:53)
youtube_scraper |     at fulfilled (/home/node/app/node_modules/scrape-youtube/lib/index.js:16:58)
youtube_scraper |     at processTicksAndRejections (node:internal/process/task_queues:94:5)
accesstechnology-mike commented 3 years ago

Is that on 2.1.2?

Yes:

 "node_modules/scrape-youtube": {
      "version": "2.1.2",
      "resolved": "https://registry.npmjs.org/scrape-youtube/-/scrape-youtube-2.
1.2.tgz",
      "integrity": "sha512-A3eUyhOB2Sbb6d/D6r5fibeWxiwCD6xJ0Guwbl1493+/OLnJC7rf3
8eQGzpUy7pg0TSF4x6Cgx1PbN66xSUAKg=="
    },
DrKain commented 3 years ago

Version 2.1.3 published with a fix, please let me know how it goes.

accesstechnology-mike commented 3 years ago

YES!

image

Thank you so much!

DrKain commented 3 years ago

Awesome. Let me know if you encounter any other problems. Thanks for the help debugging