Low unique alignment % - Githubissues

FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states

http://felixkrueger.github.io/Bismark/

GNU General Public License v3.0

394 stars 103 forks source link

Low unique alignment % #676

Closed gevro closed 5 months ago

gevro commented 5 months ago

Hi, 2 out of 23 samples, all prepared in the same batch, have low unique alignment % (38% and 45%) relative to other samples ~(75% average).

See attached bismark alignment reports for the 2 problematic samples (M3 and M5) relative to a good sample (M6).

M3.pdf M5.pdf M6.pdf

Representative FastQC of the samples does not indicate any abnormality in terms of excessive adapter or overrepresented sequences. I don't see any other reason why these two samples would have low unique alignment %. Do you have a suggestion of how to troubleshoot this? M3.fastqc.pdf M4.fastqc.pdf M5.fastqc.pdf

Thanks

gevro commented 5 months ago

Note: One possibility I will investigate is perhaps these two samples had a higher than expected spike-in genome %, accounting for the unmapped reads.

FelixKrueger commented 5 months ago

(As a general comment, if you run the deduplicate_bismark and the bismark_methylation_extractor afterwards the reports reports by bismark2report are a lot richer. Even better, running MultiQC (https://multiqc.info/) will aggregate everything into a single report. Also, all HTML files produced by Bismark, FastQC or MultiQC should be shareable, and are much nicer to look at than .pdf)

Now for the problem at hand, I agree that all QC profiles you shared look very similar, and they also look good. Some standard trimming should get rid of the unwanted adapter, so it is not obvious why the samples would behave very differently. I have compiled a few FAQs regarding low mapping efficiency here: https://felixkrueger.github.io/Bismark/faq/low_mapping/

Maybe they can set you on the right path?

gevro commented 5 months ago

Thank you. I had another idea from your FAQ website--these are NEB em-seq libraries. And I forgot to set the Max insert size to 1000. So perhaps those two samples have higher insert sizes and lost more reads due to that.

I see also now there is an nf-core pipeline for bismark with an em-seq preset. So I will just switch to that.

FelixKrueger commented 5 months ago

good point. Here are some trimming recommendations for EM-seq (https://felixkrueger.github.io/Bismark/bismark/library_types/#em-seq-neb), and there is preset for the nf-core/methylseq workflow, too (be sure to use the dev revision as 2.6.0 is a little broken...)

gevro commented 5 months ago

Thanks. How do I use the dev version exactly?

FelixKrueger commented 5 months ago

on the command line it is -r dev ( I believe)

gevro commented 5 months ago

Hi, It looks like the dev version is still broken, with at least two major bugs: https://github.com/nf-core/methylseq/issues/406

Any suggestions?

Thanks!

gevro commented 4 months ago

Hi, Seems to be working now, I had to add this to the config: process.stageInMode = 'copy'