Closed Stikus closed 5 years ago
If you have the individual, smaller bam files already, I would samtools cat
those of the first ends, then the second ends, and call OptiType on these two concatenated bam files. What you can also do is call Yara with <(zcat lane1_1.fq.gz lane2_1.fq.gz lane3_1.fq.gz lane4_1.fq.gz)
as input, instead of creating an actual big file on your disk. I'd probably do this for future samples.
@andras86 Thank you for fast answer. If I understand you correctly - we should align first and second ends separately with Yara and start OptiType with 2 BAMs?
You propose Yara instead of Razers3 as aligner because we have problems with Razer? Or Yara is better (according to developer page - yes, it is better, but my tests for aligners still in progress)?
Check out our newer, faster read aligner Yara
And here:
we recommend to use the yara read mapper instead.
And if Yara is better - is there any possibility for replacing Razers3 with Yara in OptiType itself? I find your workaround described here - it is still best way to use Yara for OptiType or something new was proposed?
Yes, exactly, align end1 and end2 separately, and call OptiType on the two resulting bam files.
Yara used to have a few niggly issues with full sensitivity, which have been rectified recently, so they should produce equivalent results under all circumstances now, and there's no reason to stick with RazerS3 anymore. You're right Yara would be more suitable to be the default rather than the workaround read mapper now, I'll keep you posted.
@andras86 I'm trying to implement Yara in my script but I have problem with your executive command
yara_mapper -e 3 -f bam -u -os /path/to/ref path/to/reads_1.fastq.gz | samtools view -h -F 4 -b1
My Yara help doesn't have such keys (-u && -os
):
```
yara_mapper - Yara Mapper
=========================
SYNOPSIS
yara_mapper [OPTIONS]
Can you tell me what they mean or replacement for them?
What is your Yara version? yara_mapper --version
Bottom of help:
VERSION
Last update: 2018-10-18_17:13:22_+0200
yara_mapper version: 0.9.11 [55b8b1f]
SeqAn version: 2.4.0
Should I use more recent version? This is last from master. Here is my installation from Dockerfile:
cd "$SOFT" \
&& git clone https://github.com/seqan/seqan.git \
&& mkdir -p "$SOFT/yara-build" \
&& cd "$SOFT/yara-build" \
&& cmake "$SOFT/seqan" -DSEQAN_BUILD_SYSTEM=APP:yara -DCMAKE_CXX_COMPILER=/usr/bin/g++-4.9 \
&& make -j"$(($(nproc)+1))" all \
&& mkdir -p "$SOFT/yara-0.9.11/bin" \
&& mv -t "$SOFT/yara-0.9.11/bin" "$SOFT/yara-build/bin/yara_indexer" "$SOFT/yara-build/bin/yara_mapper" \
&& cd "$SOFT" \
&& rm -rf "$SOFT/seqan" \
&& rm -rf "$SOFT/yara-build"
In that case, try yara_mapper -e 3 -y full -t 4 /path/to/ref /path/to/reads_1.fq.gz | samtools view -h -F 4 -b1 -
instead. I added -t 4
because Yara uses as many threads as you have cores by default, and you may not find that desirable, so it's best to limit the number of threads. Runtime doesn't decrease proportionally with the number of threads anyway.
New question - what does -b1
flag mean for samtools view? Cannot find it in help again :)
```
Usage: samtools view [options]
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.
-F 0x4
and yours -F 4
? -h
even if our output is BAM? -@
flag for samtools for multithreading?Big thanks for your answers and help :)
Upd: found -b1
description as -b
and -1
combined.
Thank you for help - looks like everything working fine.
This is my executive commands:
$YARA_MAPPER -e 3 -y full -t $YARACPUS $YARA_DNA_INDEX <(unpigz -c ${yaraNorm1Input[*]}) | $SAMTOOLS view -@ $SAMVIEWCPUS -F 4 -b1 -o $yaraMappedNorm1BAM
python3 $OPTITYPE -i $yaraMappedNorm1BAM $yaraMappedNorm2BAM --dna -p ${sampleName}_normal -o $optitypeOutDir
Looking forward for your Yara implementation as default mapper for OptiType.
As said here - OptiType cares about read pairs. But what about read groups? Here is our situation - we have WGS-data with several lines - e.g. not pair of fastq.gz but 8 files with each 2 paired. Can you propose solution how to use OptiType on them? Do we need to align then first (with Razers3 or Yara or even BWA Mem - we want to try different aligners coz Razers3 crashes on big genome files with unpredictable result) and merge BAM's after? Or we can simply summarize fastq's into 2 big paired fastq files and feed them to OptiType? What is best practice?