JimmyLaurent / torrent-search-api

Yet another node torrent scraper (supports iptorrents, torrentleech, torrent9, torrentz2, 1337x, thepiratebay, Yggtorrent, TorrentProject, Eztv, Yts, LimeTorrents)
MIT License
394 stars 100 forks source link

Unable to use Ygg provider #117

Open GhyslainBruno opened 3 years ago

GhyslainBruno commented 3 years ago

First of all, thank you for your job !

I am developing an app that has the need to search over multiple torrent providers. Currently, I am making my own crawlers, but I'm thinking about using your lib which seems awesome !

Since a few days I'm facing a new issue with Ygg provider, and to be more accurate the issue is about CloudFlare...

Steps to reproduce :

const TorrentSearchApi = require('torrent-search-api');

TorrentSearchApi.enableProvider('Yggtorrent', 'USERNAME', 'PASSWORD');

TorrentSearchApi.search('1080', 'Movies', 20)
    .then(torrents => {
        console.log(torrents);
    })
    .catch(error => {
        console.log(error);
    })

The error the lib returns :

Screenshot 2020-10-20 at 11 05 22

I saw you're using cloudscraper (and even making your own cloudflare-scraper - great !).

CloudFlare might have updated their security system.

I'm making this issue in here to have somewhere where we could all think about how to solve this new challenge.

PS: This is issue might be related to #18 issue from your cloudflare-scraper repo.

PS 2: On my app, I'm using puppeteer to scrape Ygg. Since a fews days the CloudFlare keeps reloading every 5 seconds. That's how I noticed the change. I don't know if it could be useful.

RamBoFe commented 3 years ago

Hi,

Try to use this : pumpflare. This works in my project !

++

RamBoFe commented 3 years ago

I just tested pumflare again and it doesn't seem to work anymore ... So I tried to use the cloudflare-scraper package but it didn't work. I was having the problem described in this issue . By researching I managed to make it work. Look at the solution I have proposed in this issue. I hope this will unlock you.

RamBoFe commented 3 years ago

Here we go again :sweat_smile:

I finally believe I have found the origin of your problem ! Problem I actually encountered ... To be exact, this is a User Agent problem.

I develop my app under Windows and I deploy on a VPS server under Linux. Very often my app managed to scraper the YggTorrent site but as soon as I deployed on my VPS it no longer worked ... At that time, I was using pumpflare ... I therefore resolved to drop pumpflare to try to find something else that worked under my Linux VPS. So I turned to the solution provided by the cloudflare-scraper package and I managed to make it work with the solution available in my previous comment. Unfortunately, once deployed on my VPS, it didn't work anymore !! While digging through the code, I realized that the only thing that could change compared to Windows was the User Agent. By using the Windows User Agent for all OS, I finally managed to get my app to work under my VPS.

In conclusion, the User Agent used can block access to the YggTorrent site and you should not use just any value (and it should have one). For now, one of the values of the User Agent that works every time is this :

'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'

Depending on the value used for the User Agent, CloudFlare can react in different ways:

There may be other behaviors as well. I only listed the behaviors that I encountered.

So @GhyslainBruno uses the value of the User Agent given above and I think that you will not encounter any more problems to scraper the YggTorrent site in your app πŸ˜ƒ You could use this code with puppeteer :

await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36');

Here's hoping that helps.

johann-taberlet commented 3 years ago

Hey, I managed to make this work by manually install cloudflare-scraper and then manually update its dependencies like so:

"dependencies": {
    "hcaptcha-solver": "^1.0.1",
    "puppeteer-extra": "^3.1.18",
    "puppeteer-extra-plugin-stealth": "^2.7.6",
    "request": "^2.88.2",
    "request-promise-native": "^1.0.9",
    "stormwall-bypass": "^1.0.1"
  },
  "devDependencies": {
    "puppeteer": "^5.2.1"
  },

Do not use puppeteer version > than 5.2.1 because typings are broken then.

So maybe make a PR with cloudflare-scraper as a dependency for this package and update cloudflare-scraper dependencies as well?

EDIT: And Yggtorrent URL has changed for https://www4.yggtorrent.li

GhyslainBruno commented 3 years ago

Here we go again πŸ˜…

I finally believe I have found the origin of your problem ! Problem I actually encountered ... To be exact, this is a User Agent problem.

I develop my app under Windows and I deploy on a VPS server under Linux. Very often my app managed to scraper the YggTorrent site but as soon as I deployed on my VPS it no longer worked ... At that time, I was using pumpflare ... I therefore resolved to drop pumpflare to try to find something else that worked under my Linux VPS. So I turned to the solution provided by the cloudflare-scraper package and I managed to make it work with the solution available in my previous comment. Unfortunately, once deployed on my VPS, it didn't work anymore !! While digging through the code, I realized that the only thing that could change compared to Windows was the User Agent. By using the Windows User Agent for all OS, I finally managed to get my app to work under my VPS.

In conclusion, the User Agent used can block access to the YggTorrent site and you should not use just any value (and it should have one). For now, one of the values of the User Agent that works every time is this :

'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'

Depending on the value used for the User Agent, CloudFlare can react in different ways:

  • Infinite reloading of the page when using puppeteer directly and leads to the famous error "Timeout on just a moment" in cloudflare-scraper (#18 in your PS and what your describe in PS 2)
  • The page simply does not load.
  • Error 503 with the page containing the mention "Please enable cookies".

There may be other behaviors as well. I only listed the behaviors that I encountered.

So @GhyslainBruno uses the value of the User Agent given above and I think that you will not encounter any more problems to scraper the YggTorrent site in your app πŸ˜ƒ You could use this code with puppeteer :

await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36');

Here's hoping that helps.

@RamBoFe unfortunately it didn't work for my case neither :-/. I still have the same issue within my app (infinite reloading)

GhyslainBruno commented 3 years ago

Hey, I managed to make this work by manually install cloudflare-scraper and then manually update its dependencies like so:

"dependencies": {
    "hcaptcha-solver": "^1.0.1",
    "puppeteer-extra": "^3.1.18",
    "puppeteer-extra-plugin-stealth": "^2.7.6",
    "request": "^2.88.2",
    "request-promise-native": "^1.0.9",
    "stormwall-bypass": "^1.0.1"
  },
  "devDependencies": {
    "puppeteer": "^5.2.1"
  },

Do not use puppeteer version > than 5.2.1 because typings are broken then.

So maybe make a PR with cloudflare-scraper as a dependency for this package and update cloudflare-scraper dependencies as well?

EDIT: And Yggtorrent URL has changed for https://www4.yggtorrent.li

@johann-taberlet what do you mean you managed to make it work ?

What did you do ? ^^

I tried a simple npm install cloudflare-scraper and then change its dependencies to those you spotted, didn't work for me

GhyslainBruno commented 3 years ago

@johann-taberlet thanks for your advices, I just created a new PR that should solve this issue.

I'm waiting for the PR to be merged and then I'll close this issue.

In any case @JimmyLaurent doesn't have time to do so, and if someone needs to use this asap, the fork I made will have this fix merged asap.

Thanks for your help guys !

tdehaeze commented 2 years ago

@GhyslainBruno does that still works for you? I cannot make this work. My guess is that CloudFlare updated there anti-bot system. Thanks

RamBoFe commented 2 years ago

Hi @tdehaeze ,

For my part, indeed cloudflare has updated its protection. So I too was stuck with my app. So I decided to look for a viable solution and after much research I found the FlareSolverr project to bypass the new Cloudflare protection. It works perfectly except that you can't ask him to download a torrent file from YggTorrent... The project owner himself says that you shouldn't use his project to download files. So, I only use it to retrieve the user-agent and the _cfclearance cookie and then inject it into the YggTorrent provider as described here.

Of course this require a code update... Take a look at my backend app (especially the file cloudflare.mjs and ygg.mjs) to give you an idea if you want.

Good luck ++

tdehaeze commented 2 years ago

Thanks a lot @RamBoFe ! Where can I MP you about some part of your code I wish to implement in my app? Cheers

RamBoFe commented 2 years ago

HI,

Send your email at this temporary email : ducrucrattaxe-9501@yopmail.com. I will contact you.

Germwalker commented 2 years ago

Is anyone having any news about the state of this projet ? Seems to not be working anymore

MozkaGit commented 1 year ago

Is anyone having any news about the state of this projet ? Seems to not be working anymore

It still works, I can download torrents on YggTorrent. But for the category I still use 'All' what can I put to filter the animes? When I use 'Animes' or 'Animations' it returns me that there are 0 results that have been found...

404b commented 1 year ago

If anyone still having hCaptcha Issue try https://github.com/noCaptchaAi/docs/blob/main/docs/en/token/hCaptcha.md which gives you hCaptcha token, no need to open browser.