cbrueffer / tophat-recondition

Post-processor for TopHat unmapped.bam files making them usable by downstream software.
BSD 2-Clause "Simplified" License
7 stars 5 forks source link
bioinformatics ngs python sam tophat tophat-recondition

TopHat-Recondition

bioconda-badge

tophat-recondition is a post-processor for TopHat unmapped reads (contained in unmapped.bam), making them compatible with downstream tools (e.g., the Picard suite, samtools, GATK) (TopHat issue #17). It also works around bugs in TopHat:

This software was developed as part of a PhD research project in the laboratory of Lao H. Saal, Translational Oncogenomics Unit, Department of Oncology and Pathology, Lund University, Sweden.

A detailed description of the software can be found in Brueffer and Saal (2016).

Requirements

TopHat-Recondition is available for installation with the conda package manager via the bioconda channel: conda install -c bioconda tophat-recondition

Usage

usage: tophat-recondition.py [-h] [-l LOGFILE] [-m MAPPED_FILE] [-q]
                             [-r RESULT_DIR] [-u UNMAPPED_FILE] [-v]
                             tophat_result_dir

Post-process TopHat unmapped reads. For detailed information on the issues
this software corrects, please consult the software homepage:
https://github.com/cbrueffer/tophat-recondition

positional arguments:
  tophat_result_dir     directory containing TopHat mapped and unmapped read
                        files.

optional arguments:
  -h, --help            show this help message and exit
  -l LOGFILE, --logfile LOGFILE
                        log file (optional, (default: result_dir/tophat-
                        recondition.log)
  -m MAPPED_FILE, --mapped-file MAPPED_FILE
                        Name of the file containing mapped reads (default:
                        accepted_hits.bam)
  -q, --quiet           quiet mode, no console output
  -r RESULT_DIR, --result_dir RESULT_DIR
                        directory to write unmapped_fixup.bam to (default:
                        tophat_output_dir)
  -u UNMAPPED_FILE, --unmapped-file UNMAPPED_FILE
                        Name of the file containing unmapped reads (default:
                        unmapped.bam)
  -v, --version         show program's version number and exit

Please make sure tophat_output_dir contains both, the mapped file (default: accepted_hits.bam) and the unmapped file (default: unmapped.bam). The fixed reads will be written to a file with the unmapped file name stem and the suffix _fixup, e.g. unmapped_fixup.bam, in result_dir.

Note: The unmapped file is read into memory, so make sure your computer has enough RAM to fit it.

Details

Specifically, the script does the following (see SAM format specification for details on the fields in capital letters):

Examples of error messages emitted by downstream tools when trying to process unmapped reads without some or all of these modifications can be found in this thread in the SEQanswers forum, which lead to the development of this software.

Citation

If you use this software in your research and would like to cite it, please use the citation information in the CITATION file.