Open Chanspace opened 5 days ago
To be perfectly honest, I don't exactly know whether or not Bowtie2 treats soft-masked genomes differently to unmasked genomes but I don't think it does (Google also doesn't seem to know, "how does Bowtie2 treat soft-masked index" didn't yield any great insights either).
What would you like to achieve by soft-masking repeats?
I'm sorry, I may not have expressed myself clearly. What I actually want to know is how to ensure consistent detection rates when using unmasked and soft-masked genomes in Bismark. The reason is that we have utilized soft-masked genomes in other omics analyses, so we hope to maintain consistency. However, we compared unmasked and soft-masked genomes in WGBS data analysis with bismark, and even though the generated indexes are the same, there are still differences in the subsequent methylation detection rates.
I am currently conducting Whole Genome Bisulfite Sequencing (WGBS) data analysis using Bismark and plan to utilize a soft-masked genome, where all repetitive and low-complexity regions are marked with lowercase letters.
During the index generation step, I observed that the index created is consistent with the unmasked genome. However, I noticed a significant difference in the results during the alignment step, specifically in the number of uniquely aligned reads. It appears that tools like Bowtie2 ignore the soft-masking, treating the lowercase letters as uppercase during alignment.
Is there a specific parameter or approach in Bismark that would allow me to achieve alignment results with the soft-masked genome that are comparable to those obtained with the unmasked genome? Any guidance or advice would be greatly appreciated!
Thank you!