Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 67 forks source link

Form Auth failing with handshake failure #648

Closed jacksonp2008 closed 5 years ago

jacksonp2008 commented 5 years ago

This is the latest stable release afaik.

My config, not sure if I need these as well...

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<httpcollector id="FS-Wiki-Collector">
  <logsDir>./forescout/wiki-output/logs</logsDir>

  <crawlers>
    <!-- you can have multiple crawlers -->
    <crawler id="FS-Wiki-Crawler">
      <userAgent>"FS HTTP Client"</userAgent>
      <startURLs stayOnDomain="true" stayOnPort="true" stayOnProtocol="true">
        <url>https://dsdfsdf.forescoutuniversity.com/</url>
      </startURLs>

   <httpClientFactory class="com.norconex.collector.http.client.impl.GenericHttpClientFactory">
      <authUsername>ussadfsfdsf@sadfsadfsdf</authUsername>
      <authPassword>rsadfasdfsadfsadf</authPassword>
      <authMethod>form</authMethod>
      <authUsernameField>email</authUsernameField>
      <authPasswordField>password</authPasswordField>
      <authURL>https://asdfscoutuniversity.com/login</authURL>
      <trustAllSSLCertificates>true</trustAllSSLCertificates>
   <!-- Extra form parameters required to authenticate (since 2.8.0) -->
      <authFormParams>
          <param name="remember">"disabled"</param>
          <param name="email">"asdfsdfa@sadfsdf"</param>
          <param name="password">"sadfasdf"</param>
          <!-- You can repeat this param tag as needed. -->
      </authFormParams>
    </httpClientFactory>
INFO  [AbstractCollector] Version: Norconex HTTP Collector 2.8.1 (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex Collector Core 1.9.1 (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex Importer 2.9.0 (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex JEF 4.1.0 (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex Committer Core 2.1.2 (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex Committer Elasticsearch 4.1.1-SNAPSHOT (Norconex Inc.)
INFO  [JobSuite] Running FS-Wiki-Crawler: BEGIN (Mon Nov 11 11:47:59 PST 2019)
INFO  [HttpCrawler] FS-Wiki-Crawler: RobotsTxt support: false
INFO  [HttpCrawler] FS-Wiki-Crawler: RobotsMeta support: true
INFO  [HttpCrawler] FS-Wiki-Crawler: Sitemap support: false
INFO  [HttpCrawler] FS-Wiki-Crawler: Canonical links support: true
INFO  [HttpCrawler] FS-Wiki-Crawler: User-Agent: "FS HTTP Client"
INFO  [GenericHttpClientFactory] SSL: Trusting all certificates.
INFO  [GenericHttpClientFactory] Performing FORM authentication at "https://dev-asdforescoutunsfdiversity.com/login" (username=asdf@sadf; password=*****)
INFO  [AbstractCrawler] FS-Wiki-Crawler: Crawler executed in 0 second.
ERROR [JobSuite] Execution failed for job: FS-Wiki-Crawler
com.norconex.collector.core.CollectorException: javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure
    at com.norconex.collector.http.client.impl.GenericHttpClientFactory.authenticateUsingForm(GenericHttpClientFactory.java:424)
    at com.norconex.collector.http.client.impl.GenericHttpClientFactory.createHTTPClient(GenericHttpClientFactory.java:351)
    at com.norconex.collector.http.crawler.HttpCrawler.initializeHTTPClient(HttpCrawler.java:474)
    at com.norconex.collector.http.crawler.HttpCrawler.prepareExecution(HttpCrawler.java:119)
    at com.norconex.collector.core.crawler.AbstractCrawler.doExecute(AbstractCrawler.java:216)
    at com.norconex.collector.core.crawler.AbstractCrawler.startExecution(AbstractCrawler.java:184)
    at com.norconex.jef4.job.AbstractResumableJob.execute(AbstractResumableJob.java:49)
    at com.norconex.jef4.suite.JobSuite.runJob(JobSuite.java:355)
    at com.norconex.jef4.suite.JobSuite.doExecute(JobSuite.java:296)
    at com.norconex.jef4.suite.JobSuite.execute(JobSuite.java:168)
    at com.norconex.collector.core.AbstractCollector.start(AbstractCollector.java:131)
    at com.norconex.collector.core.AbstractCollectorLauncher.launch(AbstractCollectorLauncher.java:95)
    at com.norconex.collector.http.HttpCollector.main(HttpCollector.java:74)
Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure
    at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
    at sun.security.ssl.Alerts.getSSLException(Alerts.java:154)
    at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2020)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1127)
    at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1367)
    at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1395)
    at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1379)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:404)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:364)
    at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:374)
    at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
    at com.norconex.collector.http.client.impl.GenericHttpClientFactory.authenticateUsingForm(GenericHttpClientFactory.java:415)

When I inspect the login form I see:

<form action="https://asdforescoutunivesadfrsity.com/login" method="POST" id="login-form" class="mt-l" _lpchecked="1">
                <input type="hidden" name="_token" value="buStNgjiuhAmtqqn4lDgrlVrTFqmJUjHf88M86Vj">

                <div class="stretch-inputs">
                    <div class="form-group">
    <label for="email">Email</label>
    <input type="text" id="email" name="email" tabindex="1" style="background-image: url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR4nGP6zwAAAgcBApocMXEAAAAASUVORK5CYII=&quot;); cursor: pointer;">
</div>

<div class="form-group">
    <label for="password">Password</label>
    <input type="password" id="password" name="password" tabindex="1" style="background-image: url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR4nGP6zwAAAgcBApocMXEAAAAASUVORK5CYII=&quot;); cursor: auto;">
    <span class="block small mt-s">
        <a href="https://asdforescoutuasdfniversity.com/password/email">Forgot Password?</a>
    </span>
</div>
                </div>

                <div class="grid half collapse-xs gap-xl v-center">
                    <div class="text-left ml-xxs">
                        <label class="toggle-switch ">
    <input type="checkbox" name="remember" value="on">
    <span class="custom-checkbox text-primary"><svg class="svg-icon" data-icon="check" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M18.86 4.118l-9.733 9.609-3.951-3.995-2.98 2.966 6.93 7.184L21.805 7.217z"></path></svg></span>
    <span class="label">Remember Me</span>
</label>                    </div>

                    <div class="text-right">
                        <button class="button primary" tabindex="1">Log In</button>
                    </div>
                </div>

            </form>

I recall something about this token

<input type="hidden" name="_token" value="buStNgjiuhAmtqqn4lDgrlVrTFqmJUjHf88M86Vj">

Which seems to different every time there is a login, and I presume needs to be passed back somehow. Any thoughts?

It also can do OKTA, but I don't think we ever were able to solve that one.

jacksonp2008 commented 5 years ago

it appears the only way to make this work is to disable the token. Turns out this is called a CSRF token https://en.wikipedia.org/wiki/Cross-site_request_forgery and it can be disabled on this particular site. Hopefully this will help someone else down the road. Closing.