Open srinivasv2 opened 10 years ago
To support Form authentication, I think that other ways are needed. If you can not bypass the authentication, for example, one of answers is to use a reverse proxy with authentication, such as HP IceWall SSO(it's not OSS product...). The reverse proxy log in to a site with Form authentication automatically, and then passes the contents to a crawler.
Eventually I will toss this in a public repository but if you're still looking for a solution for this I've made a gist with a small python script I wrote that uses mitmproxy to establish a login session and the appropriate cookies to all requests going through it. Right now I'm just using it to crawl our internal confluence server but eventually I plan to expand it out to work with multiple hostnames and rotating session ids: https://gist.github.com/Fapiko/d3ecfbd58ab156541da9
You'll need to add the mitmproxy ca cert to your java cacerts keystore if you're operating on something that is over SSL.
Hi geeks,
I have a requirement to index secured pages via Forms authentication using elastic search. I have used BASIC authentication feature provided in this plugin which didn't worked for me. Please provide any suggestions.
Thanks, Srinivas V