code4craft / webmagic

A scalable web crawler framework for Java.
http://webmagic.io/
Apache License 2.0
11.37k stars 4.18k forks source link

HttpClientDownloader download error #1124

Closed mayn7z closed 1 year ago

mayn7z commented 1 year ago

Closing non transactional SqlSession [org.apache.ibatis.session.defaults.DefaultSqlSession@ab70735] 2023-08-15 11:00:07.932 INFO 23432 --- [ task-1] c.b.b.w.service.WeiXinApiService : 开始爬取数据! 2023-08-15 11:00:10.035 INFO 23432 --- [ task-1] us.codecraft.webmagic.Spider : Spider www.data258.com started! 2023-08-15 11:00:13.206 WARN 23432 --- [pool-1-thread-1] u.c.w.downloader.HttpClientDownloader : download page https://xxx/xxxxx/xxxxxx?keyWords=可降解&page=1 error

java.net.SocketTimeoutException: Read timed out at java.base/java.net.SocketInputStream.socketRead0(Native Method) ~[na:na] at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115) ~[na:na] at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168) ~[na:na] at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140) ~[na:na] at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:478) ~[na:na] at java.base/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:472) ~[na:na] at java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70) ~[na:na] at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1318) ~[na:na] at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:959) ~[na:na] at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[httpcore-4.4.14.jar:4.4.14] at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) ~[httpcore-4.4.14.jar:4.4.14] at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280) ~[httpcore-4.4.14.jar:4.4.14] at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) ~[httpclient-4.5.13.jar:4.5.13] at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) ~[httpclient-4.5.13.jar:4.5.13] at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) ~[httpcore-4.4.14.jar:4.4.14] at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) ~[httpcore-4.4.14.jar:4.4.14] at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157) ~[httpclient-4.5.13.jar:4.5.13] at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) ~[httpcore-4.4.14.jar:4.4.14] at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) ~[httpcore-4.4.14.jar:4.4.14] at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[httpclient-4.5.13.jar:4.5.13] at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) ~[httpclient-4.5.13.jar:4.5.13] at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[httpclient-4.5.13.jar:4.5.13] at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[httpclient-4.5.13.jar:4.5.13] at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.13.jar:4.5.13] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.13.jar:4.5.13] at us.codecraft.webmagic.downloader.HttpClientDownloader.download(HttpClientDownloader.java:83) ~[webmagic-core-0.7.5.jar:na] at us.codecraft.webmagic.Spider.processRequest(Spider.java:419) ~[webmagic-core-0.7.5.jar:na] at us.codecraft.webmagic.Spider.access$000(Spider.java:61) ~[webmagic-core-0.7.5.jar:na] at us.codecraft.webmagic.Spider$1.run(Spider.java:322) ~[webmagic-core-0.7.5.jar:na] at us.codecraft.webmagic.thread.CountableThreadPool$1.run(CountableThreadPool.java:74) ~[webmagic-core-0.7.5.jar:na] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na] at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]

开始爬取报超时,偶现的现象是因为什么呢

LyuXiaobu commented 1 year ago

您的邮件我已收到,祝你天天开心!