bejean / crawl-anywhere

Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.
www.crawl-anywhere.com
Apache License 2.0
96 stars 38 forks source link

HttpLoader does not fully support cookies #86

Open grimsa opened 9 years ago

grimsa commented 9 years ago

In fr.eolya.utils.http.HttpLoader.getAuthCookies method:

        // A CookieStore object is created
        CookieStore cookieStore = new BasicCookieStore();
        HttpContext localContext = new BasicHttpContext();

        // ... and set
        localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);

        /* ... */

        // ... and populated
        HttpResponse response = httpClient.execute(httpPost, localContext);

        /* ... */

        // However this gets a different (empty) CookieStore object, thus returning no cookies!
        List<Cookie> cookies = httpClient.getCookieStore().getCookies();
grimsa commented 9 years ago

In fr.eolya.utils.http.HttpLoader.getHttpClient(String) method when a CookieStore is populated, the Domain for the cookies is not set, making them not be used later when making HTTP requests. The current code should be fixed:

    BasicClientCookie cookie = new BasicClientCookie(pairs.getKey(), pairs.getValue());
    //cookie.setDomain("your domain");                     // <-- this should be implemented properly
    cookie.setPath("/");
    cookieStore.addCookie(cookie);