Some trouble with the FastQC report

FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states

http://felixkrueger.github.io/Bismark/

GNU General Public License v3.0

394 stars 103 forks source link

Some trouble with the FastQC report #667

Open Citrusyh opened 7 months ago

Citrusyh commented 7 months ago

Hi Felix, I am sorry to trouble you. I’m having some trouble with the FastQC report and would like to ask you.

According to the “Per base sequence content”, should I clip the first 6 bp for good results? And this curve doesn’t look smooth.
According to the “Sequence Duplication Levels”, why is there only one line here? What’s wrong with my code?
There are so many overrepresented sequences, is it normal when dealing with RRBS data? I had input code like this:
trim_galore trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 --paired A61.1.fq.gz A61.2.fq.gz -o /export/home/***
fastqc fastqc -o /export/home/limiao29/RRBS/Lung/fastqc -t 12 /export/home/limiao29/RRBS/Lung/*.fq.gz

屏幕截图 2024-04-29 222253 屏幕截图 2024-04-29 222310 屏幕截图 2024-04-29 222333

FelixKrueger commented 7 months ago

RRBS data is weird, as by definition you are only sequencing a very small subset of the genome (hence: reduced representation). Depending on the specific protocol and genome there are only a few hundred thousand possible fragments you expect to sequence, and you've got > 30 million reads. So naturally, you will sequence the same fragments several times, and evidently some of them are highly over-represented.

This isn't really something you can do much about, (maybe with the exception of deduplicating using UMIs), but it just comes with the method. The same also goes for the base composition, it is expected. The only thing that needs (hard-)trimming are the filled-in bases from the end-repair reaction. Is this by any chance the Diagenode v2 kit by any chance?

Citrusyh commented 7 months ago

I am sorry to tell you that I know little about this, because I paid for company to do this experiment. I will ask the company for more details. thank you for your kind reply!

FelixKrueger commented 7 months ago

If it happens to be the Diagenode v2 RRBS kit, there was recently a discussion as well as some processing tips here: https://github.com/FelixKrueger/TrimGalore/issues/177#issuecomment-2012626262