JetBrains-Research / washu

Reproducible and scalable technical pipelines for ChIP-Seq and RNA-Seq processing
https://artyomovlab.wustl.edu/aging
MIT License
0 stars 0 forks source link

Fix FRiP calculation according to ENCODE def #41

Closed iromeo closed 6 years ago

iromeo commented 6 years ago

Fix FRiP calculation according to ENCODE def:

Fraction of reads in peaks (FRiP) - Fraction of all mapped reads that fall into the called peak regions, i.e. usable reads in significantly enriched peaks divided by all usable reads. In general, FRiP scores correlate positively with the number of regions. (Landt et al, Genome Research Sept. 2012, 22(9): 1813–1831)

Usable reads – A fragment is considered “usable” if it uniquely maps to the genome and remains after removing PCR duplicates (defined as two fragments that map to the same genomic position and have the same unique molecular identifier). Used to evaluate eCLIP data. 

https://www.encodeproject.org/data-standards/terms/

Instead of deduplicated uniq reads ("usable reads") we use all mapped reads. I'm not sure how it affects final results, but better is to use original definition or introduce ours so as get results reproducible by other labs.

iromeo commented 6 years ago

In our pipeline we can just calc FRiP using "unique" bams, not just original ones.

PetrTsurinov commented 6 years ago

Done in peak calling tuning.