Open pp-vatorex opened 5 days ago
Yes, If you want to add headers, you can do this
cf = CF_Solver(
'https://www.example.com',
headers={...}
)
Thanks. Any idea why this is not yet working? (I decided to move the discussion to the Reddit thread) https://www.reddit.com/r/webscraping/comments/1g40qy2/comment/lzjam02/
On some websites cf_clearance works Seeing your problem, you can try using curl_cffi to bypass it
from aqua import CF_Solver
from curl_cffi import requests
cf_clearance = cf.cookie()
session = requests.Session(impersonate='chrome124')
session.cookies['cf_clearance'] = cf_clearance
session.headers = {...} # your browser headers
resp = session.get('URL')
I will try to improve the cf_clearance extraction in this library
Thanks for your reply. Unfortunately this does not resolve the 403 error. I also tried with httpx, did fail too.
Is the cf_clearance cookie corrupted or do you expect the problem elsewhere? What's puzzling me is that if I extract the cf_clearance cookie from my browser inspection tools and paste it manually, I get a 200 response...
The cookie is corrupted, I will try to correct that error
I will use the website you are working on to do the tests
Thanks for the confirmation. Looking forward to hear from you with updates.
I improved the cf bypass and tested it with the website. And until now I realize that website is protected with turnstile
This cf_bypass works with websites that don't have turnstile, only cf_clearance
Is it possible to specify headers for the original request?
In order to work with the received cookie in follow up requests,
User-Agent
andAccept
should be identical in each follow up request.