FlareSolverr / FlareSolverr

Proxy server to bypass Cloudflare protection
MIT License
5.89k stars 528 forks source link

[multiple sites] Error solving the challenge. Timeout after X seconds - challenge loop #1036

Open MrTyton opened 4 months ago

MrTyton commented 4 months ago

Have you checked our README?

Have you followed our Troubleshooting?

Is there already an issue for your problem?

Have you checked the discussions?

Environment

- FlareSolverr version: 3.3.13
- Last working FlareSolverr version: Unsure, but was working on Monday (2024/01/08)
- Operating system: Docker Unraid
- Are you using Docker: yes
- FlareSolverr User-Agent (see log traces or / endpoint): Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
- Are you using a VPN: no
- Are you using a Proxy: no
- Are you using Captcha Solver: No
- If using captcha solver, which one:
- URL to test this issue: https://www.fanfiction.net/s/14145272/1/In-Your-Wildest-Dreams

Description

Using FanFicFare to scrape from fanfiction.net. Nothing's changed with my config, but it stopped working this week.

Logged Error Messages

01/11/2024
07:35:52 PM
2024-01-12 00:35:52 INFO     FlareSolverr 3.3.13
01/11/2024
07:35:52 PM
2024-01-12 00:35:52 INFO     Testing web browser installation...
01/11/2024
07:35:52 PM
2024-01-12 00:35:52 INFO     Platform: Linux-5.19.17-Unraid-x86_64-with-glibc2.31
01/11/2024
07:35:52 PM
2024-01-12 00:35:52 INFO     Chrome / Chromium path: /usr/bin/chromium
01/11/2024
07:35:52 PM
2024-01-12 00:35:52 INFO     Chrome / Chromium major version: 120
01/11/2024
07:35:52 PM
2024-01-12 00:35:52 INFO     Launching web browser...
01/11/2024
07:35:52 PM
version_main cannot be converted to an integer
01/11/2024
07:35:53 PM
2024-01-12 00:35:53 INFO     FlareSolverr User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
01/11/2024
07:35:53 PM
2024-01-12 00:35:53 INFO     Test successful!
01/11/2024
07:35:53 PM
2024-01-12 00:35:53 INFO     Serving on http://0.0.0.0:8191
01/11/2024
07:35:59 PM
2024-01-12 00:35:59 INFO     Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14145272/1/In-Your-Wildest-Dreams', 'maxTimeout': 65000, 'cookies': [], 'postData': None}
01/11/2024
07:35:59 PM
version_main cannot be converted to an integer
01/11/2024
07:35:59 PM
2024-01-12 00:35:59 INFO     Challenge detected. Title found: Just a moment...
01/11/2024
07:37:04 PM
2024-01-12 00:37:04 ERROR    Error: Error solving the challenge. Timeout after 65.0 seconds.
01/11/2024
07:37:04 PM
2024-01-12 00:37:04 INFO     Response in 65.723 s
01/11/2024
07:37:04 PM
2024-01-12 00:37:04 INFO     172.17.0.1 POST http://192.168.1.161:8191/v1 500 Internal Server Error

Screenshots

No response

ilike2burnthing commented 4 months ago

Debug logs and headless=false both confirm that the challenge is found, box ticked, page refreshed, but the challenge just returns. Tested on both Windows and Docker.

This was the same behaviour seen with yggtorrent, which was resolved by adding the ENV LANG and using an English language code, however I've tried several language codes to no success.

If anyone has any ideas, or it's working for anyone, let me know.

nilsherzig commented 4 months ago

I have the same issue on multiple other sites, doesn't look like a site specific thing

rebootder commented 4 months ago

3.3.9-3.3.13 I am also an infinite loop

rscm commented 4 months ago

I have the same issue on another totally different site. I had to remove the call from the script because there was no challenge. The script went clean for now. I'll try later an older version.

I'm running it on a VM in Proxmox alongside other docker apps like sonarr, radarr, etc

2024-01-13 21:12:50 INFO     ReqId 139902543808320 FlareSolverr 3.3.13
2024-01-13 21:12:50 DEBUG    ReqId 139902543808320 Debug log enabled
2024-01-13 21:12:50 INFO     ReqId 139902543808320 Testing web browser installation...
2024-01-13 21:12:50 INFO     ReqId 139902543808320 Platform: Linux-6.1.0-17-amd64-x86_64-with-glibc2.31
2024-01-13 21:12:50 INFO     ReqId 139902543808320 Chrome / Chromium path: /usr/bin/chromium
2024-01-13 21:12:50 INFO     ReqId 139902543808320 Chrome / Chromium major version: 120
2024-01-13 21:12:50 INFO     ReqId 139902543808320 Launching web browser...
2024-01-13 21:12:50 DEBUG    ReqId 139902543808320 Launching web browser...
version_main cannot be converted to an integer
2024-01-13 21:12:50 DEBUG    ReqId 139902543808320 Started executable: `/app/chromedriver` in a child process with pid: 31
2024-01-13 21:12:51 INFO     ReqId 139902543808320 FlareSolverr User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
2024-01-13 21:12:51 INFO     ReqId 139902543808320 Test successful!
2024-01-13 21:12:51 INFO     ReqId 139902543808320 Serving on http://0.0.0.0:8191
2024-01-13 21:13:01 INFO     ReqId 139902511249152 Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://xxx.yyy', 'maxTimeout': 60000}
2024-01-13 21:13:01 DEBUG    ReqId 139902511249152 Launching web browser...
version_main cannot be converted to an integer
2024-01-13 21:13:02 DEBUG    ReqId 139902511249152 Started executable: `/app/chromedriver` in a child process with pid: 163
2024-01-13 21:13:02 DEBUG    ReqId 139902511249152 New instance of webdriver has been created to perform the request
2024-01-13 21:13:02 DEBUG    ReqId 139902477678336 Navigating to... https://xxx.yyy
2024-01-13 21:14:02 DEBUG    ReqId 139902511249152 A used instance of webdriver has been destroyed
2024-01-13 21:14:02 ERROR    ReqId 139902511249152 Error: Error solving the challenge. Timeout after 60.0 seconds.
2024-01-13 21:14:02 DEBUG    ReqId 139902511249152 Response => POST /v1 body: {'status': 'error', 'message': 'Error: Error solving the challenge. Timeout after 60.0 seconds.', 'startTimestamp': 1705191181995, 'endTimestamp': 1705191242739, 'version': '3.3.13'}
2024-01-13 21:14:02 INFO     ReqId 139902511249152 Response in 60.744 s
2024-01-13 21:14:02 INFO     ReqId 139902511249152 172.19.0.1 POST http://docker.lan:8191/v1 500 Internal Server Error
jaaywags commented 4 months ago

Facing this as well

DHuckaby commented 3 months ago

I think the issue might be related to using sessions. I previously was using them and in general it worked, but for some sites it would fail after a few requests in a timeout. Switching to a standard cache of cookies and returning them in the get request solved it for me. This probably is very situational and does add more processing time I would imagine since I am spinning up more headless instances, but it worked for me.

jaaywags commented 3 months ago

Switching to a standard cache of cookies and returning them in the get request solved it for me.

How do you do this? Sorry if that is a dumb question.

DHuckaby commented 3 months ago

Switching to a standard cache of cookies and returning them in the get request solved it for me.

How do you do this? Sorry if that is a dumb question.

Cache the cookies from FlareSolverr and then send them back in your new requests.

rubenni commented 3 months ago

Switching to a standard cache of cookies and returning them in the get request solved it for me.

How do you do this? Sorry if that is a dumb question.

Cache the cookies from FlareSolverr and then send them back in your new requests.

Hi @DHuckaby, would you mind sharing an example on how to do this?

rubenni commented 3 months ago

Hi @ilike2burnthing, what I just found out is that it can take a few seconds to load the "verify I am a human box", even when using a regular browser. I guess it's checking the IP address validity, before showing the challenge. In my case, it only finds the challenge very occasionally. Therefore, is it a possibility to add a (configurable) timeout that awaits for the challenge to appear on the page? Or maybe let it check multiple times if the button is displayed on the page (referring to this line in the code ) ?

ilike2burnthing commented 3 months ago

FlareSolverr already does this. Enable debug logging and you'll see it cycling through the check multiple times.

DHuckaby commented 3 months ago

Hi @DHuckaby, would you mind sharing an example on how to do this?

# Copy of existing Python example on README
import requests

url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}
data = {
    "cmd": "request.get",
    "url": "http://www.google.com/",
    "maxTimeout": 60000
}
response = requests.post(url, headers=headers, json=data)
print(response.text)

# Extract cookies from solution response if successful
cookies = []
json_response = response.json()
if json_response["status"] == "ok":
    cookies = json_response["solution"]["cookies"]

# New request with previous request's cookies
response2 = requests.post(url, headers=headers, json=data, cookies=cookies)
print(response2.text)
MrTyton commented 3 months ago

That doesn't seem to be working for me -

01/30/2024
05:41:12 PM
2024-01-30 22:41:12 INFO     Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14316251/2/Xia', 'maxTimeout': 65000, 'cookies': [], 'postData': None}
01/30/2024
05:41:12 PM
version_main cannot be converted to an integer
01/30/2024
05:41:15 PM
2024-01-30 22:41:15 INFO     Challenge detected. Title found: Just a moment...
01/30/2024
05:42:12 PM
2024-01-30 22:42:12 INFO     Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14316251/2/Xia', 'maxTimeout': 65000, 'cookies': [], 'postData': None}
01/30/2024
05:42:12 PM
version_main cannot be converted to an integer
01/30/2024
05:42:17 PM
2024-01-30 22:42:17 INFO     Challenge detected. Title found: Just a moment...
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 ERROR    Error: Error solving the challenge. Timeout after 65.0 seconds.
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 INFO     Response in 65.662 s
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 INFO     xxx.xxx.x.xxx POST http://xxx.xxx.x.xxx/v1 500 Internal Server Error

At least for fanfiction.net, when I'm just trying to do the initial request to get a cookie.

mintertale commented 3 months ago

That doesn't seem to be working for me -

01/30/2024
05:41:12 PM
2024-01-30 22:41:12 INFO     Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14316251/2/Xia', 'maxTimeout': 65000, 'cookies': [], 'postData': None}
01/30/2024
05:41:12 PM
version_main cannot be converted to an integer
01/30/2024
05:41:15 PM
2024-01-30 22:41:15 INFO     Challenge detected. Title found: Just a moment...
01/30/2024
05:42:12 PM
2024-01-30 22:42:12 INFO     Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14316251/2/Xia', 'maxTimeout': 65000, 'cookies': [], 'postData': None}
01/30/2024
05:42:12 PM
version_main cannot be converted to an integer
01/30/2024
05:42:17 PM
2024-01-30 22:42:17 INFO     Challenge detected. Title found: Just a moment...
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 ERROR    Error: Error solving the challenge. Timeout after 65.0 seconds.
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 INFO     Response in 65.662 s
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 INFO     xxx.xxx.x.xxx POST http://xxx.xxx.x.xxx/v1 500 Internal Server Error

At least for fanfiction.net, when I'm just trying to do the initial request to get a cookie.

agree, I have same problem

Gallardo26 commented 3 months ago

I'm not sure if this issue is related, but I have face similar issues somewhere else...

On the android app for reading manga, Tachiyomi (currently stopped development but there's many forks including Mihon, SY, J2K etc...), I often face cloudflare issue for the source i'm reading. Will have to open a build-in browser then manually solve the cloudflare.

Some sources can be solved manually with the build-in browser, however, sources like Happymh has very strict cloudflare, and we have to change the user-agent in the app so that cloudflare would not get the challenge loop. Perhaps playing with different user-agent could help? Currently I've set to:

Mozilla/5.0 (Linux; Android 13; SM-G530BT) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36

Also, in another development Tachidesk, due to it's nature of running as a server, it does not have a "native browser", so we could not solve the cloudflare manually, and recently they've added Flaresolverr. But folks over there said Flaresolverr doesn't have a function to change it's user-agent (I'm not sure...), so the challenge loop also occurs.

I wish I could code (only understand very very basic coding) to help. And I hope this could help the communities if it does solve the issue everyone is facing here.

ilike2burnthing commented 3 months ago

user-agent header isn't supported, hasn't been since v2, over 2yrs ago.

Gallardo26 commented 3 months ago

So could the user-agent be the issue for the cloudflare challenge loop?

ilike2burnthing commented 3 months ago

Possibly, but I can't check.

mintertale commented 3 months ago

Mozilla/5.0 (Linux; Android 13; SM-G530BT) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36

I confirm, I added the user-agent and it worked again

https://github.com/FlareSolverr/FlareSolverr/blob/df06d13cf8f9e4ea71a22057af12e570ff3e98d4/src/utils.py#L132

Just add after line before: options.add_argument('--user-agent=Mozilla/5.0 (Linux; Android 13; SM-G530BT) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36')

Gallardo26 commented 3 months ago

Sweet... So shall we add this feature back? and also allow a var in the config to change the user-agent?

ilike2burnthing commented 3 months ago

While the ability to use an ENV to achieve this could be added, previously it was part of both FlareSolverr and FlareSolverrSharp, and could be used by indexers which required cookie and UA login. I'll have a look later, but I doubt I'll be able to recreate this. PRs welcome.

Gallardo26 commented 3 months ago

I'm current using the unraid version. Is it possible to just add a ENV VAR and set the value to it? What should the VAR be?

ilike2burnthing commented 3 months ago

Edited comment above to clarify. No, an ENV cannot currently be used.

Apo-S commented 3 months ago

PRs welcome.

https://github.com/FlareSolverr/FlareSolverr/pull/1053

ilike2burnthing commented 3 months ago

Thanks. It seems I wasn't clear though, the 'PRs welcome' part was for the tie in with FlareSolverrSharp. I'll take a look at this tomorrow just to make sure there's no issues.

marios88 commented 3 months ago

Just changed the user-agent and unfortunately it does not help with the challenge loop

howwwdi commented 3 months ago

Can confirm, that it's not related to UA, probably fixed mine in headfull, but still facing same issue for headless

alecuba16 commented 3 months ago

Can confirm, that it's not related to UA, probably fixed mine in headfull, but still facing same issue for headless

What you have changed in order to have it working? I'm testing with nowsecure.nl with a flagged ip and it is not working in my case. (Using in a container with built in xvfb)

Gallardo26 commented 3 months ago

I think it's important to say which source is tested and how it is working or not. some of the cloudflare loop is due to cloudflare being over secure. Some sources could be other reasons.

ilike2burnthing commented 3 months ago

@Apo-S can you provide an example UA and URL that are working for you with your PR? I'm just getting Error 1010 pages or the same challenge loops as before, even if I only change a single character of the UA.

SmartArray commented 3 months ago

I was running into the same issue and I figured out why it happens. Working on a fix but it won't be easy

Gallardo26 commented 3 months ago

I was running into the same issue and I figured out why it happens. Working on a fix but it won't be easy

Possible to give a brief layman explanation? (To cure my itch?)

Recently on Mihon app which I was able to just change user-agent, now all seems to fail to bypass even after using the app's browser to manually solve the captcha. The downloader couldn't seems to download the chapters. It's been about 3 days. Could be Cloudflare having high alert for the particular website. Perhaps all these are different issues.

SmartArray commented 3 months ago

Yes, cloudflare is able to detect undeteceted_chrome now. Unfortunately I don't have enough time to reverse engineer their JS code right now. I am way too busy right now. Sorry guys.

Funny enough, it works when you have the dev console open because it does a test for the debugger instruction. It seems to use an heuristic approach to determine the browser. Not sure if it will help you...

juanfrilla commented 3 months ago

The only package I know bypasses cloudflare right now is selenium-driverless, https://github.com/kaliiiiiiiiii/Selenium-Driverless Maybe you can take look at how it works and fix the flaresolverr

tadasgedgaudas commented 3 months ago

Released a fix, please check if it works for you. For me it works on all sites I tested: https://github.com/FlareSolverr/FlareSolverr/pull/1065

howwwdi commented 3 months ago

Released a fix, please check if it works for you. For me it works on all sites I tested: #1065

Won't work with proxies

tadasgedgaudas commented 3 months ago

Released a fix, please check if it works for you. For me it works on all sites I tested: #1065

Won't work with proxies

won't or doesn't? Did you test it?

howwwdi commented 3 months ago

Released a fix, please check if it works for you. For me it works on all sites I tested: #1065

Won't work with proxies

won't or doesn't? Did you test it?

Yea, i did. Proxy extension causes devtools closure, so cf solving fails

tadasgedgaudas commented 3 months ago

Released a fix, please check if it works for you. For me it works on all sites I tested: #1065

Won't work with proxies

won't or doesn't? Did you test it?

Yea, i did. Proxy extension causes devtools closure, so cf solving fails

got it, I see the issue as well, will see what can be done

tadasgedgaudas commented 3 months ago

Released a fix, please check if it works for you. For me it works on all sites I tested: #1065

Won't work with proxies

won't or doesn't? Did you test it?

Yea, i did. Proxy extension causes devtools closure, so cf solving fails

Can you check now? Added a small fix, works for me with proxies with authentication. Don't have a proxy without authentication to test with right now, but it seems it should work too

SmartArray commented 3 months ago

Glad that my finding helped us to move forward. I am sorry that I couldn't tackle it.

@tadasgedgaudas Your PR looks excellent, I am going to check it out soon!!

howwwdi commented 3 months ago

Can you check now? Added a small fix, works for me with proxies with authentication. Don't have a proxy without authentication to test with right now, but it seems it should work too

Okay, i've checked again, so the problem with proxy seems to be resovled, but when u adding cookies for GET request, u need to change driver's window again.

ilike2burnthing commented 3 months ago

All PR discussion should take place on the PR itself, not here. Thanks.

@howwwdi can you leave a review with any edits you see needing made?

behead974 commented 3 months ago

Debug logs and headless=false both confirm that the challenge is found, box ticked, page refreshed, but the challenge just returns. Tested on both Windows and Docker.

This was the same behaviour seen with yggtorrent, which was resolved by adding the ENV LANG and using an English language code, however I've tried several language codes to no success.

If anyone has any ideas, or it's working for anyone, let me know.

still no solution to the problem that hotfix 2 didn't solve?

lordofgore commented 2 months ago

hello guys, funny enough I had this setup like 12-14 hours ago and working fine. Now I see this error also and is breaking everything, :)

Do you know/estimate when the community will be able to get the latest docker image with the fix for this issue?

Thank you!

behead974 commented 2 months ago

hello guys, funny enough I had this setup like 12-14 hours ago and working fine. Now I see this error also and is breaking everything, :)

Do you know/estimate when the community will be able to get the latest docker image with the fix for this issue?

Thank you!

I was afraid that I was the only one in this case and that the problem had been abandoned after the hotfix2 version.

lordofgore commented 2 months ago

I was afraid that I was the only one in this case and that the problem had been abandoned after the hotfix2 version.

No, the issue is still present:

{"status": "error", "message": "Error: Error solving the challenge. Timeout after 60.0 seconds.", "startTimestamp": 1708359293265, "endTimestamp": 1708359354054, "version": "3.3.14-hotfix2"}

and the same logs looping through the challenge, with ocasionally checkbox found and clicked.

walkerp1 commented 2 months ago

FlareSolverr 3.3.14-hotfix2 2024-02-19 14:54:48 DEBUG ReqId 45624 Try to find the Cloudflare verify checkbox... 2024-02-19 14:54:48 DEBUG ReqId 45624 Cloudflare verify checkbox not found on the page. 2024-02-19 14:54:48 DEBUG ReqId 45624 Try to find the Cloudflare 'Verify you are human' button... 2024-02-19 14:54:48 DEBUG ReqId 45624 The Cloudflare 'Verify you are human' button not found on the page. 2024-02-19 14:54:50 DEBUG ReqId 45624 Waiting for title (attempt 16): Just a moment... 2024-02-19 14:54:51 DEBUG ReqId 45624 Timeout waiting for selector

Repeat every 3s. It was working on an RSS feed. Logfile shows it started failing about 17:25 GMT 2/19/24.

ilike2burnthing commented 2 months ago

v3.3.14, and the subsequent hotfixes, have nothing to do with this issue, just waiting on https://github.com/FlareSolverr/FlareSolverr/pull/1065, which I'll look into more shortly.

ilike2burnthing commented 2 months ago

v3.3.15

Thanks @tadasgedgaudas

Resolves looping challenge issue with most sites. Known exceptions:

Further PRs are welcome.