Strand conversion mismatch rate?

aleighbrown commented 2 years ago

I notice the samples I've processed through HISAT3N now, once I've made the conversion table, I have about twice as many convesion on the + strand compared to the -

This is pair-end SLAM-seq (strandedness: "FR") HISAT3N build command:

hisat-3n-build --base-change T,C /SAN/vyplab/vyplab_reference_genomes/sequence/human/GRCh38/GRCh38.primary_assembly.genome.fa /SAN/vyplab/vyplab_reference_genomes/hisat-3n/human/raw

alignment call

    /SAN/vyplab/alb_projects/tools/hisat-3n/hisat-3n \
    -x  /SAN/vyplab/vyplab_reference_genomes/hisat-3n/human/raw \
    -1 {input.one} \
    -2 {input.two} \
    -q \
    -S {params.outputPrefix} \
    --base-change T,C\
    --rna-strandness FR

I've added on a few downstream scripts to turn the conversion table into bigwigs, the first step being to split into the + and - minus strands

114896436 sample_3_8h.+.txt
41236957 sample 3_8h.-.txt

In the 2 cases I notice that the conversions are much more in the + strand (about 2.5 - 3 times higher)

Is this kind of thing to be expected?

imzhangyun commented 2 years ago

Hello @aleighbrown,

This is normal for SLAM-seq data. When a read can be assign either + or - strand (for example, read is perfectly aligned to the reference and there is no conversion), HISAT-3N assign + to the YZ tag. Since the conversion rate for SLAM-seq is low and many reads have no conversion, we should expect more reads with YZ:A:+ tag.

Leo

aleighbrown commented 2 years ago

Thanks!

DaehwanKimLab / hisat2

Strand conversion mismatch rate? #333