code4craft / webmagic

A scalable web crawler framework for Java.
http://webmagic.io/
Apache License 2.0
11.42k stars 4.18k forks source link

使用代理IP抓取,报http请求 432错误,问题已修复,请改源代码 #1079

Closed song51930 closed 2 years ago

song51930 commented 2 years ago

使用代理进行请求时,请求的url全部出现432错误代码。错误提示如下: us.codecraft.webmagic.Spider:422 - page status code error, page https://www.baidu.com/s?ie=utf-8&medium=0&bsst=1&rsv_dl=news_b_pn&cl=2&wd=%E6%88%98%E7%95%A5%2BIntel&tn=news&rsv_bp=1&rsv_n=2&oq=&rsv_sug3=10&rsv_sug1=3&rsv_sug7=101&rsv_sug2=0&rsv_btype=t&f=8&x_bfe_rqs=03E80&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&pn=0 , code: 432 使用代码如下:

Proxy proxy=new Proxy("111.76.66.105",40023,"hys_813101","12345678");
        HttpClientDownloader httpClientDownloader=new HttpClientDownloader();
        httpClientDownloader.setProxyProvider(SimpleProxyProvider.from(proxy));
        Spider.create(new BaiduPageProcessor()).thread(1)
               .setDownloader(httpClientDownloader)
                 .addUrl(url1)
                .runAsync();

解决办法: 修改us.codecraft.webmagic.downloader.HttpUriRequestConverter 类下convertHttpClientContext方法 原方法:

HttpClientContext httpContext = new HttpClientContext();
        if (proxy != null && proxy.getUsername() != null) {
            AuthState authState = new AuthState();
            authState.update(new BasicScheme(ChallengeState.PROXY), new UsernamePasswordCredentials(proxy.getUsername(), proxy.getPassword()));
            httpContext.setAttribute("http.auth.proxy-scope", authState);
        }

修改后方法:

 HttpClientContext httpContext =HttpClientContext.create();
        if (proxy != null && proxy.getUsername() != null) {
            CredentialsProvider credsProvider = new BasicCredentialsProvider();
            credsProvider.setCredentials(
                    new AuthScope(proxy.getHost(), proxy.getPort()),
                    new UsernamePasswordCredentials(proxy.getUsername(), proxy.getPassword()));
            httpContext.setCredentialsProvider(credsProvider);
        }

希望能把源码修改一下

song51930 commented 2 years ago

自己解决,丰衣足食

sutra commented 2 years ago

如果你需要修改代码,请提交 PR,并提供足够的单元测试覆盖。