eloidieme / DD2480_Coverage

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Refactoring for _link_allowed function #13

Open lovisastrange opened 8 months ago

lovisastrange commented 8 months ago

Is the high complexity you identified really necessary? Is it possible to split up the code (in the five complex functions you have identified) into smaller units to reduce complexity? If so, how would you go about this? Document your plan.

lovisastrange commented 8 months ago

Looking at the code, we can see that it is structured as a number of if-clauses with conditions a link should follow. If it does not fulfill one of the criteria, the function returns false. One way to split this into blocks would be to divide these if-clauses into categories depending on the condition they test. For example, one could create the categories

deny or allow url depending on its content deny or allow domain for the link

lovisastrange commented 8 months ago

Now, we can use this division of the code to create two helper methods, one for each of these cases

lovisastrange commented 8 months ago

We then get the three of the if-clauses from before (checking if the url is valid, if there is any extensions and if the text is restricted), as well as two if statements checking the result of the helper functions.

Then, the complexity is (counted by hand)

M = \pi - s + 2 = 7 - 6 + 2 = 3

which is lower.

lovisastrange commented 8 months ago

Tests of the new helper functions should be added, to make sure they give a correct result. Also, the whole program should be tested, to see that the changes haven't introduced any new bugs.

Lastly, the new methods need to be documented.

lovisastrange commented 8 months ago

In this case, the refactoring does decrease the complexity of the function. However, it might also make it less clear what is happening, since the code for the helper methods will be separate from the function, and reading the code then requires more work. The function is, despite its somewhat high complexity, pretty readable to start with, and therefore this refactoring might not be worth it.