Open MrTyton opened 4 months ago
Debug logs and headless=false both confirm that the challenge is found, box ticked, page refreshed, but the challenge just returns. Tested on both Windows and Docker.
This was the same behaviour seen with yggtorrent, which was resolved by adding the ENV LANG
and using an English language code, however I've tried several language codes to no success.
If anyone has any ideas, or it's working for anyone, let me know.
I have the same issue on multiple other sites, doesn't look like a site specific thing
3.3.9-3.3.13 I am also an infinite loop
I have the same issue on another totally different site. I had to remove the call from the script because there was no challenge. The script went clean for now. I'll try later an older version.
I'm running it on a VM in Proxmox alongside other docker apps like sonarr, radarr, etc
2024-01-13 21:12:50 INFO ReqId 139902543808320 FlareSolverr 3.3.13
2024-01-13 21:12:50 DEBUG ReqId 139902543808320 Debug log enabled
2024-01-13 21:12:50 INFO ReqId 139902543808320 Testing web browser installation...
2024-01-13 21:12:50 INFO ReqId 139902543808320 Platform: Linux-6.1.0-17-amd64-x86_64-with-glibc2.31
2024-01-13 21:12:50 INFO ReqId 139902543808320 Chrome / Chromium path: /usr/bin/chromium
2024-01-13 21:12:50 INFO ReqId 139902543808320 Chrome / Chromium major version: 120
2024-01-13 21:12:50 INFO ReqId 139902543808320 Launching web browser...
2024-01-13 21:12:50 DEBUG ReqId 139902543808320 Launching web browser...
version_main cannot be converted to an integer
2024-01-13 21:12:50 DEBUG ReqId 139902543808320 Started executable: `/app/chromedriver` in a child process with pid: 31
2024-01-13 21:12:51 INFO ReqId 139902543808320 FlareSolverr User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
2024-01-13 21:12:51 INFO ReqId 139902543808320 Test successful!
2024-01-13 21:12:51 INFO ReqId 139902543808320 Serving on http://0.0.0.0:8191
2024-01-13 21:13:01 INFO ReqId 139902511249152 Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://xxx.yyy', 'maxTimeout': 60000}
2024-01-13 21:13:01 DEBUG ReqId 139902511249152 Launching web browser...
version_main cannot be converted to an integer
2024-01-13 21:13:02 DEBUG ReqId 139902511249152 Started executable: `/app/chromedriver` in a child process with pid: 163
2024-01-13 21:13:02 DEBUG ReqId 139902511249152 New instance of webdriver has been created to perform the request
2024-01-13 21:13:02 DEBUG ReqId 139902477678336 Navigating to... https://xxx.yyy
2024-01-13 21:14:02 DEBUG ReqId 139902511249152 A used instance of webdriver has been destroyed
2024-01-13 21:14:02 ERROR ReqId 139902511249152 Error: Error solving the challenge. Timeout after 60.0 seconds.
2024-01-13 21:14:02 DEBUG ReqId 139902511249152 Response => POST /v1 body: {'status': 'error', 'message': 'Error: Error solving the challenge. Timeout after 60.0 seconds.', 'startTimestamp': 1705191181995, 'endTimestamp': 1705191242739, 'version': '3.3.13'}
2024-01-13 21:14:02 INFO ReqId 139902511249152 Response in 60.744 s
2024-01-13 21:14:02 INFO ReqId 139902511249152 172.19.0.1 POST http://docker.lan:8191/v1 500 Internal Server Error
Facing this as well
I think the issue might be related to using sessions. I previously was using them and in general it worked, but for some sites it would fail after a few requests in a timeout. Switching to a standard cache of cookies and returning them in the get request solved it for me. This probably is very situational and does add more processing time I would imagine since I am spinning up more headless instances, but it worked for me.
Switching to a standard cache of cookies and returning them in the get request solved it for me.
How do you do this? Sorry if that is a dumb question.
Switching to a standard cache of cookies and returning them in the get request solved it for me.
How do you do this? Sorry if that is a dumb question.
Cache the cookies from FlareSolverr and then send them back in your new requests.
Switching to a standard cache of cookies and returning them in the get request solved it for me.
How do you do this? Sorry if that is a dumb question.
Cache the cookies from FlareSolverr and then send them back in your new requests.
Hi @DHuckaby, would you mind sharing an example on how to do this?
Hi @ilike2burnthing, what I just found out is that it can take a few seconds to load the "verify I am a human box", even when using a regular browser. I guess it's checking the IP address validity, before showing the challenge. In my case, it only finds the challenge very occasionally. Therefore, is it a possibility to add a (configurable) timeout that awaits for the challenge to appear on the page? Or maybe let it check multiple times if the button is displayed on the page (referring to this line in the code ) ?
FlareSolverr already does this. Enable debug logging and you'll see it cycling through the check multiple times.
Hi @DHuckaby, would you mind sharing an example on how to do this?
# Copy of existing Python example on README
import requests
url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}
data = {
"cmd": "request.get",
"url": "http://www.google.com/",
"maxTimeout": 60000
}
response = requests.post(url, headers=headers, json=data)
print(response.text)
# Extract cookies from solution response if successful
cookies = []
json_response = response.json()
if json_response["status"] == "ok":
cookies = json_response["solution"]["cookies"]
# New request with previous request's cookies
response2 = requests.post(url, headers=headers, json=data, cookies=cookies)
print(response2.text)
That doesn't seem to be working for me -
01/30/2024
05:41:12 PM
2024-01-30 22:41:12 INFO Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14316251/2/Xia', 'maxTimeout': 65000, 'cookies': [], 'postData': None}
01/30/2024
05:41:12 PM
version_main cannot be converted to an integer
01/30/2024
05:41:15 PM
2024-01-30 22:41:15 INFO Challenge detected. Title found: Just a moment...
01/30/2024
05:42:12 PM
2024-01-30 22:42:12 INFO Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14316251/2/Xia', 'maxTimeout': 65000, 'cookies': [], 'postData': None}
01/30/2024
05:42:12 PM
version_main cannot be converted to an integer
01/30/2024
05:42:17 PM
2024-01-30 22:42:17 INFO Challenge detected. Title found: Just a moment...
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 ERROR Error: Error solving the challenge. Timeout after 65.0 seconds.
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 INFO Response in 65.662 s
01/30/2024
05:42:18 PM
2024-01-30 22:42:18 INFO xxx.xxx.x.xxx POST http://xxx.xxx.x.xxx/v1 500 Internal Server Error
At least for fanfiction.net, when I'm just trying to do the initial request to get a cookie.
That doesn't seem to be working for me -
01/30/2024 05:41:12 PM 2024-01-30 22:41:12 INFO Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14316251/2/Xia', 'maxTimeout': 65000, 'cookies': [], 'postData': None} 01/30/2024 05:41:12 PM version_main cannot be converted to an integer 01/30/2024 05:41:15 PM 2024-01-30 22:41:15 INFO Challenge detected. Title found: Just a moment... 01/30/2024 05:42:12 PM 2024-01-30 22:42:12 INFO Incoming request => POST /v1 body: {'cmd': 'request.get', 'url': 'https://www.fanfiction.net/s/14316251/2/Xia', 'maxTimeout': 65000, 'cookies': [], 'postData': None} 01/30/2024 05:42:12 PM version_main cannot be converted to an integer 01/30/2024 05:42:17 PM 2024-01-30 22:42:17 INFO Challenge detected. Title found: Just a moment... 01/30/2024 05:42:18 PM 2024-01-30 22:42:18 ERROR Error: Error solving the challenge. Timeout after 65.0 seconds. 01/30/2024 05:42:18 PM 2024-01-30 22:42:18 INFO Response in 65.662 s 01/30/2024 05:42:18 PM 2024-01-30 22:42:18 INFO xxx.xxx.x.xxx POST http://xxx.xxx.x.xxx/v1 500 Internal Server Error
At least for fanfiction.net, when I'm just trying to do the initial request to get a cookie.
agree, I have same problem
I'm not sure if this issue is related, but I have face similar issues somewhere else...
On the android app for reading manga, Tachiyomi (currently stopped development but there's many forks including Mihon, SY, J2K etc...), I often face cloudflare issue for the source i'm reading. Will have to open a build-in browser then manually solve the cloudflare.
Some sources can be solved manually with the build-in browser, however, sources like Happymh has very strict cloudflare, and we have to change the user-agent in the app so that cloudflare would not get the challenge loop. Perhaps playing with different user-agent could help? Currently I've set to:
Mozilla/5.0 (Linux; Android 13; SM-G530BT) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36
Also, in another development Tachidesk, due to it's nature of running as a server, it does not have a "native browser", so we could not solve the cloudflare manually, and recently they've added Flaresolverr. But folks over there said Flaresolverr doesn't have a function to change it's user-agent (I'm not sure...), so the challenge loop also occurs.
I wish I could code (only understand very very basic coding) to help. And I hope this could help the communities if it does solve the issue everyone is facing here.
user-agent header isn't supported, hasn't been since v2, over 2yrs ago.
So could the user-agent be the issue for the cloudflare challenge loop?
Possibly, but I can't check.
Mozilla/5.0 (Linux; Android 13; SM-G530BT) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36
I confirm, I added the user-agent and it worked again
Just add after line before:
options.add_argument('--user-agent=Mozilla/5.0 (Linux; Android 13; SM-G530BT) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36')
Sweet... So shall we add this feature back? and also allow a var in the config to change the user-agent?
While the ability to use an ENV to achieve this could be added, previously it was part of both FlareSolverr and FlareSolverrSharp, and could be used by indexers which required cookie and UA login. I'll have a look later, but I doubt I'll be able to recreate this. PRs welcome.
I'm current using the unraid version. Is it possible to just add a ENV VAR and set the value to it? What should the VAR be?
Edited comment above to clarify. No, an ENV cannot currently be used.
Thanks. It seems I wasn't clear though, the 'PRs welcome' part was for the tie in with FlareSolverrSharp. I'll take a look at this tomorrow just to make sure there's no issues.
Just changed the user-agent and unfortunately it does not help with the challenge loop
Can confirm, that it's not related to UA, probably fixed mine in headfull, but still facing same issue for headless
Can confirm, that it's not related to UA, probably fixed mine in headfull, but still facing same issue for headless
What you have changed in order to have it working? I'm testing with nowsecure.nl with a flagged ip and it is not working in my case. (Using in a container with built in xvfb)
I think it's important to say which source is tested and how it is working or not. some of the cloudflare loop is due to cloudflare being over secure. Some sources could be other reasons.
@Apo-S can you provide an example UA and URL that are working for you with your PR? I'm just getting Error 1010 pages or the same challenge loops as before, even if I only change a single character of the UA.
I was running into the same issue and I figured out why it happens. Working on a fix but it won't be easy
I was running into the same issue and I figured out why it happens. Working on a fix but it won't be easy
Possible to give a brief layman explanation? (To cure my itch?)
Recently on Mihon app which I was able to just change user-agent, now all seems to fail to bypass even after using the app's browser to manually solve the captcha. The downloader couldn't seems to download the chapters. It's been about 3 days. Could be Cloudflare having high alert for the particular website. Perhaps all these are different issues.
Yes, cloudflare is able to detect undeteceted_chrome now. Unfortunately I don't have enough time to reverse engineer their JS code right now. I am way too busy right now. Sorry guys.
Funny enough, it works when you have the dev console open because it does a test for the debugger
instruction.
It seems to use an heuristic approach to determine the browser. Not sure if it will help you...
The only package I know bypasses cloudflare right now is selenium-driverless, https://github.com/kaliiiiiiiiii/Selenium-Driverless Maybe you can take look at how it works and fix the flaresolverr
Released a fix, please check if it works for you. For me it works on all sites I tested: https://github.com/FlareSolverr/FlareSolverr/pull/1065
Released a fix, please check if it works for you. For me it works on all sites I tested: #1065
Won't work with proxies
Released a fix, please check if it works for you. For me it works on all sites I tested: #1065
Won't work with proxies
won't or doesn't? Did you test it?
Released a fix, please check if it works for you. For me it works on all sites I tested: #1065
Won't work with proxies
won't or doesn't? Did you test it?
Yea, i did. Proxy extension causes devtools closure, so cf solving fails
Released a fix, please check if it works for you. For me it works on all sites I tested: #1065
Won't work with proxies
won't or doesn't? Did you test it?
Yea, i did. Proxy extension causes devtools closure, so cf solving fails
got it, I see the issue as well, will see what can be done
Released a fix, please check if it works for you. For me it works on all sites I tested: #1065
Won't work with proxies
won't or doesn't? Did you test it?
Yea, i did. Proxy extension causes devtools closure, so cf solving fails
Can you check now? Added a small fix, works for me with proxies with authentication. Don't have a proxy without authentication to test with right now, but it seems it should work too
Glad that my finding helped us to move forward. I am sorry that I couldn't tackle it.
@tadasgedgaudas Your PR looks excellent, I am going to check it out soon!!
Can you check now? Added a small fix, works for me with proxies with authentication. Don't have a proxy without authentication to test with right now, but it seems it should work too
Okay, i've checked again, so the problem with proxy seems to be resovled, but when u adding cookies for GET request, u need to change driver's window again.
All PR discussion should take place on the PR itself, not here. Thanks.
@howwwdi can you leave a review with any edits you see needing made?
Debug logs and headless=false both confirm that the challenge is found, box ticked, page refreshed, but the challenge just returns. Tested on both Windows and Docker.
This was the same behaviour seen with yggtorrent, which was resolved by adding the ENV
LANG
and using an English language code, however I've tried several language codes to no success.If anyone has any ideas, or it's working for anyone, let me know.
still no solution to the problem that hotfix 2 didn't solve?
hello guys, funny enough I had this setup like 12-14 hours ago and working fine. Now I see this error also and is breaking everything, :)
Do you know/estimate when the community will be able to get the latest docker image with the fix for this issue?
Thank you!
hello guys, funny enough I had this setup like 12-14 hours ago and working fine. Now I see this error also and is breaking everything, :)
Do you know/estimate when the community will be able to get the latest docker image with the fix for this issue?
Thank you!
I was afraid that I was the only one in this case and that the problem had been abandoned after the hotfix2 version.
I was afraid that I was the only one in this case and that the problem had been abandoned after the hotfix2 version.
No, the issue is still present:
{"status": "error", "message": "Error: Error solving the challenge. Timeout after 60.0 seconds.", "startTimestamp": 1708359293265, "endTimestamp": 1708359354054, "version": "3.3.14-hotfix2"}
and the same logs looping through the challenge, with ocasionally checkbox found and clicked.
FlareSolverr 3.3.14-hotfix2 2024-02-19 14:54:48 DEBUG ReqId 45624 Try to find the Cloudflare verify checkbox... 2024-02-19 14:54:48 DEBUG ReqId 45624 Cloudflare verify checkbox not found on the page. 2024-02-19 14:54:48 DEBUG ReqId 45624 Try to find the Cloudflare 'Verify you are human' button... 2024-02-19 14:54:48 DEBUG ReqId 45624 The Cloudflare 'Verify you are human' button not found on the page. 2024-02-19 14:54:50 DEBUG ReqId 45624 Waiting for title (attempt 16): Just a moment... 2024-02-19 14:54:51 DEBUG ReqId 45624 Timeout waiting for selector
Repeat every 3s. It was working on an RSS feed. Logfile shows it started failing about 17:25 GMT 2/19/24.
v3.3.14, and the subsequent hotfixes, have nothing to do with this issue, just waiting on https://github.com/FlareSolverr/FlareSolverr/pull/1065, which I'll look into more shortly.
Thanks @tadasgedgaudas
Resolves looping challenge issue with most sites. Known exceptions:
Further PRs are welcome.
Have you checked our README?
Have you followed our Troubleshooting?
Is there already an issue for your problem?
Have you checked the discussions?
Environment
Description
Using FanFicFare to scrape from fanfiction.net. Nothing's changed with my config, but it stopped working this week.
Logged Error Messages
Screenshots
No response