code4craft / webmagic

A scalable web crawler framework for Java.
http://webmagic.io/
Apache License 2.0
11.37k stars 4.18k forks source link

Refactored Code to increase maintainability #1152

Closed ayushi250317 closed 5 months ago

ayushi250317 commented 5 months ago

Refactored code using the following methods :

  1. Extract Class : Class Spider has multiple responsibilities making it less cohesive. To improve the cohesive factor, separated scheduler related code to a new class called SpiderScheduler. This will help in separation of concerns and increase code maintainability.

  2. Push Down method: Method smartcontent was not a supported operation for Class PlainText but its implementation was forced through hierarchy and hence it threw UnsupportedOperationException. smartcontent method is only used in HtmlNode class and so pulled it from Selectable class so that every class implementing Selectable class does not have to implement this method if the operation is unsupported

  3. Replace conditional using polymorphism: In class BasicTypeFormatter, multiple conditions were used to determine the type of Basic class. Replaced this conditional by introducing BasicClassDetector interface that is implemented by all the types of Basic class. This will help in increasing code maintainability for extension in Basic class types. Hence, the BasicTypeFormatter now follows Open Closed Principle.