Ruitulyu / KAS-Analyzer

New computational framework to process and analyze KAS-seq and spKAS-seq data.
MIT License
10 stars 4 forks source link

read mapping summary values always 0 #9

Open ecroot opened 1 year ago

ecroot commented 1 year ago

Describe the bug The mapping summary files always seem to give the number of reads mapped as 0, and a mapping ratio of 0.00%. However, mapping appears to run successfully, so I suspect these values are incorrect.

For example, running samtools flagstat on one of the bam files resulting from the mapping command gives the following output:

16933826 + 0 in total (QC-passed reads + QC-failed reads)
16928870 + 0 primary
0 + 0 secondary
4956 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
16825665 + 0 mapped (99.36% : N/A)
16820709 + 0 primary mapped (99.36% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

The kas-seq summary file for the same sample:

Number of KAS-seq reads: 16928870

Number of mapped reads. bwa: 0

Mapping ratios: 0.00%

Number of deduplicated mapped reads: 6976790

Duplication ratios. samtools: 0%
0%
0%
55.75%

The relevant line from the statistics file:

Samples Clean_reads Mapped_reads    Deduplicated_reads  Mapping_ratios  Duplication_ratios
Y3_S3_R1_001_trimmed.fq.gz_SE_KAS-seq_mapping_summary   16928870    0   6976790 0.00%   0%

Is this a bug, or am I doing something wrong?

I also do not understand what the final 4 percentages represent in the summary file. Please can you either point me towards the relevant documentation, or explain this?

To Reproduce Steps to reproduce the behavior: Run a kas-seq command with bwa to generate a mapping summary.

Expected behavior Correct values in the file, or more informative documentation for the summary file, or an explanation for why the mapping summary differs from the terminal output and samtools flagstats.

Desktop (please complete the following information):

Additional context I need the correct ratios to be able to perform the normalisation steps for various other features of the tool.

ecroot commented 1 year ago

Perhaps the bug is to do with line 326 in map_KAS-seq.sh:

mapped_reads_num=$( grep chr HeLa-S3_spKAS-seq.rep1.sam | grep ^@ -v | awk '{print $1}' | sort -u | wc -l )

the grep command seems to require a hardcoded filename, rather than using any of the variables in the rest of the script.

ecroot commented 11 months ago

Thanks for updating map_KAS-seq.sh to be compatible with t2t. The updated script still contains the bug on line 326: the command is run for a specific file, rather than ${prefix}.sam