Closed song51930 closed 2 years ago
使用代理进行请求时,请求的url全部出现432错误代码。错误提示如下: us.codecraft.webmagic.Spider:422 - page status code error, page https://www.baidu.com/s?ie=utf-8&medium=0&bsst=1&rsv_dl=news_b_pn&cl=2&wd=%E6%88%98%E7%95%A5%2BIntel&tn=news&rsv_bp=1&rsv_n=2&oq=&rsv_sug3=10&rsv_sug1=3&rsv_sug7=101&rsv_sug2=0&rsv_btype=t&f=8&x_bfe_rqs=03E80&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&pn=0 , code: 432 使用代码如下:
Proxy proxy=new Proxy("111.76.66.105",40023,"hys_813101","12345678"); HttpClientDownloader httpClientDownloader=new HttpClientDownloader(); httpClientDownloader.setProxyProvider(SimpleProxyProvider.from(proxy)); Spider.create(new BaiduPageProcessor()).thread(1) .setDownloader(httpClientDownloader) .addUrl(url1) .runAsync();
解决办法: 修改us.codecraft.webmagic.downloader.HttpUriRequestConverter 类下convertHttpClientContext方法 原方法:
HttpClientContext httpContext = new HttpClientContext(); if (proxy != null && proxy.getUsername() != null) { AuthState authState = new AuthState(); authState.update(new BasicScheme(ChallengeState.PROXY), new UsernamePasswordCredentials(proxy.getUsername(), proxy.getPassword())); httpContext.setAttribute("http.auth.proxy-scope", authState); }
修改后方法:
HttpClientContext httpContext =HttpClientContext.create(); if (proxy != null && proxy.getUsername() != null) { CredentialsProvider credsProvider = new BasicCredentialsProvider(); credsProvider.setCredentials( new AuthScope(proxy.getHost(), proxy.getPort()), new UsernamePasswordCredentials(proxy.getUsername(), proxy.getPassword())); httpContext.setCredentialsProvider(credsProvider); }
希望能把源码修改一下
自己解决,丰衣足食
如果你需要修改代码,请提交 PR,并提供足够的单元测试覆盖。
使用代理进行请求时,请求的url全部出现432错误代码。错误提示如下: us.codecraft.webmagic.Spider:422 - page status code error, page https://www.baidu.com/s?ie=utf-8&medium=0&bsst=1&rsv_dl=news_b_pn&cl=2&wd=%E6%88%98%E7%95%A5%2BIntel&tn=news&rsv_bp=1&rsv_n=2&oq=&rsv_sug3=10&rsv_sug1=3&rsv_sug7=101&rsv_sug2=0&rsv_btype=t&f=8&x_bfe_rqs=03E80&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&pn=0 , code: 432 使用代码如下:
解决办法: 修改us.codecraft.webmagic.downloader.HttpUriRequestConverter 类下convertHttpClientContext方法 原方法:
修改后方法:
希望能把源码修改一下