Open rica-v3 opened 6 years ago
10개 페이지를 수집 해서 저장하는데 1초도 안걸린다. 쓰레드를 보면 알겠지만 병렬적으로 처리한다.
**2018-02-04 17:46:02,809(시작)**[http-nio-8080-exec-1] INFOc.d.k.CrawlerApplicationController - Crawling items from Naver...
2018-02-04 17:46:02,810 [http-nio-8080-exec-1] INFO c.d.k.service.NaverCrawlingService - Crawling naver sale items...
2018-02-04 17:46:02,889 [http-nio-8080-exec-1] INFO c.d.k.CrawlerApplicationController - Crawling time: 0 seconds.
2018-02-04 17:46:03,847 [reactor-http-nio-6] INFO c.d.k.service.NaverCrawlingService - Saved 10 items from 코멧 125
2018-02-04 17:46:03,847 [reactor-http-nio-2] INFO c.d.k.service.NaverCrawlingService - Saved 10 items from 코멧 125
2018-02-04 17:46:03,847 [reactor-http-nio-8] INFO c.d.k.service.NaverCrawlingService - Saved 10 items from 코멧 125
2018-02-04 17:46:03,847 [reactor-http-nio-4] INFO c.d.k.service.NaverCrawlingService - Saved 10 items from 코멧 125
2018-02-04 17:46:03,953 [reactor-http-nio-2] INFO c.d.k.service.NaverCrawlingService - Saved 10 items from 코멧 125
2018-02-04 17:46:03,953 [reactor-http-nio-6] INFO c.d.k.service.NaverCrawlingService - Saved 10 items from 코멧 125
2018-02-04 17:46:03,953 [reactor-http-nio-4] INFO c.d.k.service.NaverCrawlingService - Saved 10 items from 코멧 125
2018-02-04 17:46:03,953 [reactor-http-nio-8] INFO c.d.k.service.NaverCrawlingService - Saved 10 items from 코멧 125
2018-02-04 17:46:04,027 [reactor-http-nio-4] INFO c.d.k.service.NaverCrawlingService - Saved 10 items from 코멧 125
**2018-02-04 17:46:04,027(종료)** [reactor-http-nio-6] INFO c.d.k.service.NaverCrawlingService- Saved 10 items from 코멧 125
이전 버전에서 동일한 환경과 조건에서 작업을 해봤더니 대략 7초가 걸렸다. 쓰레드를 보면 단일 쓰레드로 순차적으로 처리하는걸 볼 수 있다. 17:55:45,043 ~ 17:55:52,143
**2018-02-04 17:55:45,043(시작) **[http-nio-8080-exec-1] INFO c.d.k.CrawlerApplicationController - Crawling items from Naver...
2018-02-04 17:55:45,044 [http-nio-8080-exec-1] INFO c.d.k.service.NaverCrawlingService - Crawling naver sale items...
2018-02-04 17:55:45,044 [http-nio-8080-exec-1] INFO c.d.k.c.n.NaverCafeSearchCrawler - query: 코멧 125
pageLimit: 10
startPageNumber: 0
2018-02-04 17:55:45,045 [http-nio-8080-exec-1] DEBUG c.d.k.c.n.NaverCafeSearchCrawler - Crawling page 0
2018-02-04 17:55:48,616 [http-nio-8080-exec-1] DEBUG c.d.k.c.n.NaverCafeSearchCrawler - Crawling page 1
2018-02-04 17:55:50,453 [http-nio-8080-exec-1] DEBUG c.d.k.c.n.NaverCafeSearchCrawler - Crawling page 2
2018-02-04 17:55:50,513 [http-nio-8080-exec-1] DEBUG c.d.k.c.n.NaverCafeSearchCrawler - Crawling page 3
2018-02-04 17:55:50,644 [http-nio-8080-exec-1] DEBUG c.d.k.c.n.NaverCafeSearchCrawler - Crawling page 4
2018-02-04 17:55:51,278 [http-nio-8080-exec-1] DEBUG c.d.k.c.n.NaverCafeSearchCrawler - Crawling page 5
2018-02-04 17:55:51,551 [http-nio-8080-exec-1] DEBUG c.d.k.c.n.NaverCafeSearchCrawler - Crawling page 6
2018-02-04 17:55:51,619 [http-nio-8080-exec-1] DEBUG c.d.k.c.n.NaverCafeSearchCrawler - Crawling page 7
2018-02-04 17:55:51,689 [http-nio-8080-exec-1] DEBUG c.d.k.c.n.NaverCafeSearchCrawler - Crawling page 8
2018-02-04 17:55:51,831 [http-nio-8080-exec-1] DEBUG c.d.k.c.n.NaverCafeSearchCrawler - Crawling page 9
2018-02-04 17:55:51,964 [http-nio-8080-exec-1] INFO c.d.k.service.NaverCrawlingService - Saving searched 100 items...
**2018-02-04 17:55:52,143(종료) ** [http-nio-8080-exec-1] INFO c.d.k.service.NaverCrawlingService - Naver sale item update completed.
2018-02-04 17:55:52,144 [http-nio-8080-exec-1] INFO c.d.k.CrawlerApplicationController - Crawling time: 7 seconds.
Network I/O 에서 발생하는 Blocking을 없애고 순차적으로 처리하던 걸 병렬적으로 처리했더니 성능(작업 시간)이 확연하게 줄었다. (대단한일은 아니고 당연하다...)
job은 query 개수 만큼 (3468) 생성 됐는데, 중간에 job 3388까지만 완료되고 더 이상 진행 되지 않는 문제가 발생함.
(문제 파악 중 ...)
`2018-02-11 17:00:56,660 [http-nio-8080-exec-1] INFO c.d.k.CrawlerApplicationController - Crawling items from Naver...
2018-02-11 17:00:57,089 [http-nio-8080-exec-1] INFO c.d.k.service.NaverCrawlingService - 3468 queries are generated
.....
2018-02-11 17:59:06,294 [http-nio-8080-exec-1] DEBUG c.d.k.c.n.AsyncNaverCafeSearchCrawler - Get doc: https://section.cafe.naver.com/ArticleSearch.nhn?query=후크 125&page=1#%7B"query":"후크 125"%7D
2018-02-11 17:59:06,294 [http-nio-8080-exec-1] INFO c.d.k.service.NaverCrawlingService - Dealaying...
2018-02-11 17:59:06,454 [reactor-http-nio-8] INFO c.d.k.service.NaverCrawlingService - job 3387 : saved 10 items from 후크 125
2018-02-11 17:59:07,296 [http-nio-8080-exec-1] DEBUG c.d.k.c.n.AsyncNaverCafeSearchCrawler - Get doc: https://section.cafe.naver.com/ArticleSearch.nhn?query=힙스터 250&page=1#%7B"query":"힙스터 250"%7D
2018-02-11 17:59:07,296 [http-nio-8080-exec-1] INFO c.d.k.service.NaverCrawlingService - Dealaying...
2018-02-11 17:59:07,467 [reactor-http-nio-8] INFO c.d.k.service.NaverCrawlingService - job 3388 : saved 10 items from 힙스터 250 `
Spring rector로 구현 해봄 (related PR #30)