databio / peppro

A modular, containerized pipeline for PRO-seq data processing
http://peppro.databio.org/
BSD 2-Clause "Simplified" License
10 stars 2 forks source link

duplicated fastq processing code #36

Closed nsheff closed 4 years ago

nsheff commented 5 years ago

Related to #29 and #25 The process_fastq procedure has this logic:

    if args.trimmer == "seqtk":
        _process_no_umi_seqtk(...)
    elif args.trimmer == "fastx":
        _process_no_umi_fastx(...)
    else:
        _process_no_umi_seqtk(...)
...

which is needlessly duplicating code.

I can confirm that these two blocks of code are identical:

        if args.trimmer == "seqtk":
            if paired_end:
                trim_cmd_chunks_R2 = [
                    tools.seqtk,
                    "trimfq",
                    ("-e", str(args.umi_len))
                ]
                trim_cmd_chunks_R2.extend(["-"])
                if args.protocol.lower() in RUNON_SOURCE_GRO:
                    trim_cmd_chunks_R2.extend([
                        (">", trimmed_fastq_R2)
                    ])
                else:
                    trim_cmd_chunks_R2.extend([
                        "|",
                        (tools.seqtk, "seq"),
                        ("-r", "-"),
                        (">", trimmed_fastq_R2)
                    ])
            else:
                trim_cmd_chunks = [
                    tools.seqtk,
                    "trimfq",
                    ("-b", str(args.umi_len))
                ]
                if args.max_len != -1:
                    trim_cmd_chunks.extend([
                        ("-L", str(args.max_len))
                    ])
                if args.complexity and args.umi_len > 0:
                    #trim_cmd_chunks_nodedup = trim_cmd_chunks.copy()  #python3
                    trim_cmd_chunks_nodedup = list(trim_cmd_chunks)
                    trim_cmd_chunks_nodedup.extend([noadap_fastq])
                    if args.protocol.lower() in RUNON_SOURCE_GRO:
                        trim_cmd_chunks_nodedup.extend([
                            (">", trimmed_fastq)
                        ])
                    else:
                        trim_cmd_chunks_nodedup.extend([
                            "|",
                            (tools.seqtk, "seq"),
                            ("-r", "-"),
                            (">", trimmed_fastq)
                        ])
                    trim_cmd_chunks.extend([dedup_fastq])
                    if args.protocol.lower() in RUNON_SOURCE_GRO:
                        trim_cmd_chunks.extend([
                            (">", processed_fastq)
                        ])
                    else:
                        trim_cmd_chunks.extend([
                            "|",
                            (tools.seqtk, "seq"),
                            ("-r", "-"),
                            (">", processed_fastq)
                        ])
                else:
                    trim_cmd_chunks.extend(["-"])
                    if args.protocol.lower() in RUNON_SOURCE_GRO:
                        trim_cmd_chunks.extend([
                            (">", processed_fastq)
                        ])
                    else:
                        trim_cmd_chunks.extend([
                            "|",
                            (tools.seqtk, "seq"),
                            ("-r", "-"),
                            (">", processed_fastq)
                        ])
nsheff commented 5 years ago

is the code deletion in there acceptable?

jpsmith5 commented 4 years ago

reworked fastq_processing wholesale