Closed pkoloveas closed 2 years ago
Trying to answer some of the questions:
user_agent.string
and cookies
are the only configs required for using cookies.That being said, the cookies feature did not work in 100% of the sites we tested. Apparently, there are some specific sites that detect that something is different from the browser request and do not authorize it. Unfortunately, it is hard to debug such cases. It is needed to find what is different between the browser and the crawler requests to know why it is not working.
Thank you very much for your answer.
I have one more question regarding the matter: Has the cookies feature been tested with logins accompanied by CAPTCHA?
I'm not sure, we may have done that. But I think it should work as well.
I have the following ache.yml configuration file to crawl a deep web site with authentication:
While the crawler is successful on extracting all the links on the login page, it doesn't actually authenticate using the cookie so I am not able to crawl within the website. Am I supposed to add something on the particular url in the seeds file to indicate the cookie? Do I need to add something in the yml file to complete the configuration? Does the cookie feature work on deep web sites?