CostaLab / reg-gen

Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data.
https://reg-gen.readthedocs.io/
Other
101 stars 30 forks source link

Extension sizes estimated with THOR #266

Closed dmitrymyl closed 1 month ago

dmitrymyl commented 5 months ago

Hi!

I analyze ChIP-seq data between two conditions, 1 ChIP sample and 1 Input per each condition.

I have some concerns about estimated extension sizes reported in *-setup.info. I obtained fragment sizes with MACS2 (using predictd), and they are 201 and 236 for my two ChIP samples (read lengths are 75). But THOR reports extension sizes 162 and 116. By doing basic math, one can notice that 162 + 75 = 237, which is close to the fragment size of my second sample, while 116 + 75 = 191, which is close to my first sample. It feels like the numbers are mixed up. Or it might be that my samples are noisy.

  1. Are these reported numbers extension sizes or fragment sizes?
  2. Are they indeed in correct order for two ChIP samples?
  3. Can I use extension sizes estimated by MACS2? How should I supply them? Fragment size minus read length or somehow else?

Thanks in advance!

dmitrymyl commented 1 month ago
  1. After looking at code, the numbers are extension sizes, so that read length + extension size = fragment size.
  2. Ordering seems correct based on assessment from other samples.
  3. MACS2 estimates can be used.