Closed nsheff closed 4 years ago
This logical flow shows up repeatedly:
if paired_end:
# do stuff 1
else:
if args.complexity and args.umi_len > 0:
# do stuff 2
else:
# do stuff 3
the process should be divided into functions to minimize the over-indentation problems that made the code too long and unreadable. some function ideas:
_deduplicate_reads()
if args.umi:
block takes up almost 450 lines of code.So it would look like:
...
if not paired_end and not args.umi_len <= 0:
dedup_cmd = _build_dedup_cmd(...)
else:
dedup_cmd = _build_dedup_cmd(...)
if args.umi:
if adapter == "fastp":
_process_umi_fastp(...)
else:
_process_umi_seqtk(...)
else:
if args.trimmer == "seqtk":
_process_no_umi_seqtk(...)
elif args.trimmer == "fastx":
_process_no_umi_fastx(...)
else:
use seq tk anyway?? <--- here the code looks to duplicate again?
...
got the function below 800 LOC... getting there...
Reworked all these functions into separate components.
the
_process_fastq
function is more than 1000 lines of code. This should be split out into separate functions or simplified to be understandable. There's no strict line limit for functions but as a rule if your function is getting over a hundred or two lines it's probably time to split it up...