galaxyproject / training-material

A collection of Galaxy-related training material
https://training.galaxyproject.org
MIT License
310 stars 916 forks source link

“Remove PCR duplicates” is done twice in ATAC-Seq tutorial #1660

Open hrhotz opened 5 years ago

hrhotz commented 5 years ago

I am struggling to understand, why the "Filter Option: Remove PCR duplicates” in the Genrich tool is set to "Yes" after running the MarkDuplicates tool with “If true do not write duplicates to the output file instead of writing them with appropriate flags set” set to "Yes". The list of PCR duplicates produced by Genrich is empty.

When I run Genrich with "Filter Option: Remove PCR duplicates” set to "No", I get the same result for the bedgraph pile up and the encode peak files.

Also, when I skip the MarkDuplicates step and run Genrich with "Filter Option: Remove PCR duplicates” set to "Yes", I get the same result. And this time with a list of 3549 PCR duplicates

hexylena commented 5 years ago

@heylf do you have time for this?

lldelisle commented 5 years ago

Hi, Indeed, we don't need to remove duplicates in Genrich. In fact Genrich can do a lot of filtering by itself that we did not used in this tutorial but if you want to use another peak caller like macs2, you will need to keep this filtering steps (including the MarkDuplicates step). I would like to modify this tutorial to propose both approaches:

hexylena commented 5 years ago

That's great to hear @lldelisle