I had a few errors subsetting bams, because my RNA bams reference has non-chr contigs (i.e. '6' , not 'chr6') and only the main chr contigs. My DNA bam has other contigs so I still want to run the pipeline with contig_reads=T. I am sharing the edits to bin/subset_bam_opt.sh that solved it for me.
1. non 'chr' chromosome 6
From line 144:
# first, check the chromosome name in hla_coords is present in the bam file
chr_name=$(echo ${hla_coords} | cut -d ":" -f 1)
if ! samtools view -H ${bam} | grep -q ${chr_name}; then
echo "Chromosome name ${chr_name} not found in bam file ${bam}"
# if not, try without "chr" prefix
# if not in the bam either, exit with error
chr_name=$(echo $chr_name | sed s/^chr//g)
echo "Trying with ${chr_name} instead"
if ! samtools view -H ${bam} | grep -q ${chr_name}; then
echo "Error: chromosome name ${chr_name} not found in bam file ${bam} either - quitting!!"
exit 1
fi
fi
echo Extracting reads aligning to ${chr_name}
samtools view -@ ${task_cpus} -bh -o ${tmp_dir}/region.bam ${bam} ${chr_name}
and also the print statement at the end to have $chr_name instead of $hla_coords
printf " %-55s %10s\n" "Number of reads extracted from ${chr_name}:" "${region_read_count}"
2. manage empty grep statement when there are no secondary contigs and skip the read extraction
I added a true or statement because the empty grep made it crash
# Filter out standard chromosomes from the contig list, keeping only non-standard contigs in the output file
true || grep -v -E '^chr[1-9]{1}$|chr1[0-9]{1}$|chr2[0-2]{1}$|^[0-9]{1}$|^1[0-9]{1}$|^2[0-2]{1}$|^chr[XYMT]{1,2}$|^[XYMT]{1,2}$' ${tmp_dir}/contigs.txt > ${contig_file}
and moved the section between comments "# Iterate through the non-standard contigs in the ${contig_file} file" and "# Extract reads that contain HLA reference kmers if params == true" to inside an if $contig_file exists statement:
# only if contig file exists
if [ -s "$contig_file" ]; then
# Iterate through the non-standard contigs in the ${contig_file} file
cat ${contig_file} | while read line || [[ -n $line ]];
do
# If the contig is a standard chromosome, skip extraction
if [[ "${chroms[*]}\n" =~ "${line}" ]]; then
echo ${line} will not be extracted
else
# Extract reads mapping to the non-standard contig and append them to the contig.bam file
echo Extracting reads mapping to ${line}
samtools view -@ ${task_cpus} -bh -o ${tmp_dir}/${line}_tmp_contig.bam ${bam} ${line}
# Count the number of reads in the temporary contig.bam file
contig_read_count=$(samtools view -@ ${task_cpus} -c ${tmp_dir}/${line}_tmp_contig.bam)
# Append the contig name and read count to the contig_read_counts.csv file
echo ${line},${contig_read_count} >> "${out_prefix}.read_counts.csv"
fi
done
tmp_contig_bams=$(ls ${tmp_dir}/*tmp_contig.bam)
# Merge input.bam and tmp_contig.bams
samtools merge -c -p -@ ${task_cpus} ${tmp_dir}/contigs.bam ${tmp_contig_bams}
# remove temporary contig.bam files
rm ${tmp_contig_bams}
# Count the number of reads in the contigs.bam file
contig_read_count=$(samtools view -@ ${task_cpus} -c ${tmp_dir}/contigs.bam)
# Merge input.bam and contigs.bam
samtools merge -c -p -@ ${task_cpus} ${tmp_dir}/merged_input.bam ${tmp_dir}/input.bam ${tmp_dir}/contigs.bam
# remove input.bam and contigs.bam
rm ${tmp_dir}/contigs.bam ${tmp_dir}/input.bam
# rename merged_input.bam to input.bam
mv ${tmp_dir}/merged_input.bam ${tmp_dir}/input.bam
fi
fi
Hi,
I had a few errors subsetting bams, because my RNA bams reference has non-chr contigs (i.e. '6' , not 'chr6') and only the main chr contigs. My DNA bam has other contigs so I still want to run the pipeline with contig_reads=T. I am sharing the edits to
bin/subset_bam_opt.sh
that solved it for me.1. non 'chr' chromosome 6
From line 144:
and also the print statement at the end to have $chr_name instead of $hla_coords
2. manage empty grep statement when there are no secondary contigs and skip the read extraction
I added a true or statement because the empty grep made it crash
and moved the section between comments "# Iterate through the non-standard contigs in the ${contig_file} file" and "# Extract reads that contain HLA reference kmers if params == true" to inside an if $contig_file exists statement:
Best,
Irene