RobertsLab / resources

https://robertslab.github.io/resources/
19 stars 11 forks source link

Quality Trim EPI geoduck data #408

Closed sr320 closed 4 years ago

sr320 commented 6 years ago

Maybe you already did this too 😄 but what is your recommendation for trimming?

Maybe 10bp 5' Phred remove adaptors ??

kubu4 commented 6 years ago

Indeed, I did!

http://onsnetwork.org/kubu4/2018/05/16/trimgalorefastqcmultiqc-trimgalore-rrbs-geoduck-bs-seq-fastq-data-directional/

kubu4 commented 6 years ago

Here's the original discussion: Issue #260 .

sr320 commented 5 years ago

There are adaptor issues with ~10bp 5' and ~5bp 3'

eg. https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/0807/EPI-167_S10_L002_R1_001_val_1_bismark_bt2_PE_report.html

can you give it another trim (with hard clips) from raw to see if we can resolve.

kubu4 commented 5 years ago

I don't see any adaptor contamination in the FastQC/MultiQC reports:

http://owl.fish.washington.edu/Athaliana/20180516_geoduck_trimgalore_rrbs/20180516_geoduck_trimmed_fastqc/multiqc_data/multiqc_report.html

But, I can easily perform additional hard trimming. Is the concern this wonky stuff at the 5' end?

20190923_001

sr320 commented 5 years ago

See M-Bias plots in the link I provided.

On Sep 23, 2019, 9:42 AM -0700, kubu4 notifications@github.com, wrote:

I don't see any adaptor contamination in the FastQC/MultiQC reports: http://owl.fish.washington.edu/Athaliana/20180516_geoduck_trimgalore_rrbs/20180516_geoduck_trimmed_fastqc/multiqc_data/multiqc_report.html But, I can easily perform additional hard trimming. Is the concern this wonky stuff at the 5' end? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

kubu4 commented 5 years ago

I think it might be better to process the data in Bismark using:

bismark_methylation_extractor --ignore <int> --ignore_r2 <int>

That way, we're not "permanently" truncating the sequencing files.

Also, here's the Bismark explanation:

https://github.com/FelixKrueger/Bismark/tree/master/Docs#usage-bismark_methylation_extractor-options-filenames

And, here's one additional note about tweaking M-Bias (linked via the Bismark documentation on M-Bias):

https://sequencing.qcfail.com/articles/library-end-repair-reaction-introduces-methylation-biases-in-paired-end-pe-bisulfite-seq-applications/

Anyway, per our conversation, I'll just get these hard trimmed.

kubu4 commented 5 years ago

I trimmed 20bp from the 5' end of each read. Data is here:

https://gannet.fish.washington.edu/Atumefaciens/20190923_pgen_fastp_EPI_trimming/

MultiQC report is here:

https://gannet.fish.washington.edu/Atumefaciens/20190923_pgen_fastp_EPI_trimming/multiqc_report.html

Interestingly, MulitQC can interpret the output data from the trimming program I used (fastp).

AND, fastp actually generates its own HTML report (including before/after info!):

https://gannet.fish.washington.edu/Atumefaciens/20190923_pgen_fastp_EPI_trimming/fastp.html

Anyway, here's an example of how the data looks (before/after):

20190924_003


20190924_004

github-actions[bot] commented 5 years ago

Stale issue message