Open Citrusyh opened 7 months ago
RRBS data is weird, as by definition you are only sequencing a very small subset of the genome (hence: reduced representation). Depending on the specific protocol and genome there are only a few hundred thousand possible fragments you expect to sequence, and you've got > 30 million reads. So naturally, you will sequence the same fragments several times, and evidently some of them are highly over-represented.
This isn't really something you can do much about, (maybe with the exception of deduplicating using UMIs), but it just comes with the method. The same also goes for the base composition, it is expected. The only thing that needs (hard-)trimming are the filled-in bases from the end-repair reaction. Is this by any chance the Diagenode v2 kit by any chance?
I am sorry to tell you that I know little about this, because I paid for company to do this experiment. I will ask the company for more details. thank you for your kind reply!
If it happens to be the Diagenode v2 RRBS kit, there was recently a discussion as well as some processing tips here: https://github.com/FelixKrueger/TrimGalore/issues/177#issuecomment-2012626262
Hi Felix, I am sorry to trouble you. I’m having some trouble with the FastQC report and would like to ask you.