barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
137 stars 21 forks source link

No new junction prediction when using --aligned-sam #334

Closed vrohnie closed 1 year ago

vrohnie commented 1 year ago

Is there any way to enable the prediction of new junctions (JC) while also using the --aligned-sam input?

To elaborate a little. I am trying to test breseq on different read coverages and wanted to downsample the reference.bam file (produced by breseq) and use that as an input in order to save time.

But I realized that when the --aligned-sam option is set, the new junction prediction is skipped in principle, but I don't fully understand why.

Best, Veronika

jeffreybarrick commented 1 year ago

This option is intended for users who want to align reads on their own.

To predict junctions breseq needs to compare split read alignments to normal alignments, which can only be done by going through its steps of creating candidate junctions and aligning twice. (In other words, it's not possible to reconstruct the junctions from just one aligned SAM file.)

For what you're trying to test, look at the -l = --limit-fold-coverage option. We often use this when we have coverage in excess of 40-60x for clonal samples.

You can see the description of this option by running this:

breseq -h

If this option won't work, try converting the BAM/SAM back to FASTQ or downsampling the original FASTQ.

vrohnie commented 1 year ago

Thank you very much for the prompt response.

Yes, the initial idea was to use already mapped bam files (uisng bwa mem) and compare different SV callers on the same bam files. But since the breseq documentation strongly advises against that we decided to go with the standard workflow for breseq.

I just tried to use the --aligned-sam option to speed up things for the "downsampled" settings, but I will probably fall back to just downsampling the fastq files.

Since the comparison is focused on complex SVs, the called junctions are the key output of breseq I'm interested in. Is there possibly any other way I could speed up the breseq runtime? Like steps I could skip when only looking for SVs?

Best regards, Veronika

jeffreybarrick commented 1 year ago

It depends on whether you want breseq to predict deletions from missing coverage evidence (MC). If so, then there are no shortcuts. If not, you can skip just that or that plus predicting point mutations from read alignment evidence (RA). There's no way to skip just the read alignment type mutation prediction.

Relevant options:

--skip-RA-MC-prediction
--skip-MC-prediction
vrohnie commented 1 year ago

Thanks again for your feedback.

I will check those options out :)