Closed marcelm closed 4 years ago
I suggest to go through the individual commits instead of looking at the full diff at once.
I want to comment on e78c4a03af93898bbe21c08754344d4a0303278e "Make parse_and_filter_pairs have only one responsibility and rename to paired_reads()".
A function should have only one responsibility (single-responsibility principle, one of the SOLID principles). So a function name parse_and_filter_pairs
is a code smell to me because it has the word "and" in it, which is a sign that the function may be doing too much. In this case, it was
I changed the function to only do the first of these things, which made the function better because
The second responsibility of the function could be turned into another function that just filters read pairs, but for the moment, I chose to just inline the code into the calling function.
The changes looks good! Thank you for fixing some our code smells and improving readability.
I took the liberty to refactor part of the logic in find_barcode_duplicated
as this was a bit confusing. This also fixed the TODO about not changing the state of the PositionTracker
object.
I completely understand that the script is quite hard to follow. It does a lot of different things and most importantly it is quite lacking in readability. Having spent quite some time on it I have a good grasp on most parts, so it there is anything in particular to clarify I could probably help.
I’ll try to find some time (after my vacation) to do a code review of the script.
Looks good to me.
I will merge this to be able to continue on other parts.
I’ve tried to understand what
clusterrmdup
does by reading the code. While I failed to do so, I refactored it a little bit along the way.