epi2me-labs / pychopper

cDNA read preprocessing
Other
61 stars 9 forks source link

ONT_logo


Pychopper

Pychopper v2 is a tool to identify, orient and trim full-length Nanopore cDNA reads. The tool is also able to rescue fused reads.

Background

The general approach of Pychopper v2 is the following:

dp_segmentation

Getting Started

Installation

Install using conda :

conda install -c nanoporetech -c conda-forge -c bioconda "nanoporetech::pychopper"

Usage

usage: pychopper [-h] [-b primers] [-g phmm_file] [-c config_file]
                          [-k kit] [-q cutoff] [-Q min_qual] [-z min_len]
                          [-r report_pdf] [-u unclass_output]
                          [-l len_fail_output] [-w rescue_output]
                          [-S stats_output] [-K qc_fail_output]
                          [-Y autotune_nr] [-L autotune_samples]
                          [-A scores_output] [-m method] [-x rescue] [-p]
                          [-t threads] [-B batch_size] [-D read stats]
                          input_fastx output_fastx

Tool to identify, orient and rescue full-length cDNA reads.

positional arguments:
  input_fastx          Input file.
  output_fastx         Output file.

optional arguments:
  -h, --help           show this help message and exit
  -b primers           Primers fasta.
  -g phmm_file         File with custom profile HMMs (None).
  -c config_file       File to specify primer configurations for each
                       direction (None).
  -k kit{PCS109,PCS110,PCS111,LSK114}
                       Use primer sequences from this kit (PCS109).
  -q cutoff            Cutoff parameter (autotuned).
  -Q min_qual          Minimum mean base quality (7.0).
  -z min_len           Minimum segment length (50).
  -r report_pdf        Report PDF (pychopper_report.pdf).
  -u unclass_output    Write unclassified reads to this file.
  -l len_fail_output   Write fragments failing the length filter in this file.
  -w rescue_output     Write rescued reads to this file.
  -S stats_output      Write statistics to this file.
  -K qc_fail_output    Write reads failing mean quality filter to this file.
  -Y autotune_nr       Approximate number of reads used for tuning the cutoff
                       parameter (10000).
  -L autotune_samples  Number of samples taken when tuning cutoff parameter
                       (30).
  -A scores_output     Write alignment scores to this BED file.
  -m method            Detection method: phmm or edlib (phmm).
  -x rescue            Protocol-specific read rescue: DCS109 (None).
  -p                   Keep primers, but trim the rest.
  -t threads           Number of threads to use (8).
  -B batch_size        Maximum number of reads processed in each batch
                       (1000000).
  -y fastq_comments    Use with minimap2 -y to pass UMI and additional info into BAM file (false).
  -U umi               Detect umis. 

WARNING: Do not turn on trimming during basecalling as it will remove the primers needed for classifying the reads!

Basic usage

Example usage with default PCS109/DCS109 primers using the default pHMM backend:

pychopper -r report.pdf -u unclassified.fq -w rescued.fq input.fq full_length_output.fq

Example usage with default PCS109/DCS109 primers using the edlib/parasail backend:

pychopper -m edlib -r report.pdf -u unclassified.fq -w rescued.fq input.fq full_length_output.fq

Example usage with default PCS109/DCS109 primers using the default pHMM backend:

pychopper -r report.pdf -A aln_hits.bed -S statistics.tsv -u unclassified.fq -w rescued.fq input.fq full_length_output.fq

Advanced usage

The fasta files with custom primers used by the edlib/parasail backend can be specified through -b, while the valid primer configurations are specified through -c:

pychopper -m edlib -b custom_pimers.fas -c primer_config.txt input.fq full_length_output.fq

Where the contents of primer_config.txt looks like +:MySSP,-MyVNP|-:MyVNP,-MySSP.

The pHMM alignment backend takes a "compressed" profile HMM trained from a multiple sequence alignment using the hmmer package. Custom profile HMMs can be trained from a fastq of reads and a fasta file with the primer sequences using the hammerpede package. The path to the custom profile HMM can be specified using -g:

pychopper -m phmm -g MySSP_MyVNP.hmm -c primer_config.txt input.fq full_length_output.fq

UMI detection

Detect umis in input reads using -U

FASTQ output example:

pychopper -U -k PCS111 -m edlib pychopper/tests/data/PCS111_umi_test_reads.fastq -

will add:

umi=TTTGCCATTGAAATTAGCGTTCGCCTT

to the FASTQ header in the output file.

BAM output example:

pychopper -U -y -k PCS111 -m edlib pychopper/tests/data/PCS111_umi_test_reads.fastq - | minimap2 -y ... 

will create a BAM file with the following tags containing the UMI:

RX:Z:TTTGCCATTGAAATTAGCGTTCGCCTT

Help

FAQs and tips

References and Supporting Information

See the post announcing the tools at the Oxford Nanopore Technologies Community here.

Research Release

Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.