biocore-ntnu / epic2

Ultraperformant reimplementation of SICER
https://doi.org/10.1093/bioinformatics/btz232
MIT License
55 stars 9 forks source link

median fragment size #49

Open tan5251 opened 3 years ago

tan5251 commented 3 years ago

My data is 150PE. But the median framgent size in some samples is bigger to 100000. It really confuse me. Is it normal and why it happen ?

It is the results of normal samples. Parsing ChIP file(s): ./bedpe/d331Ip.bedpe Total eligible paired end reads: 12015516

Valid ChIP reads: 11661132 (12015516 before out of bounds removal)

Score threshold: 18.686

Number of tags in a window: 3

Number of islands found: 33710

Parsing Input file(s): ./bedpe/d331In.bedpe Total eligible paired end reads: 12467232

Valid Background reads: 12438786 (12467232 before out of bounds removal)

Found a median fragment size of 321.5

Using chromosome sizes found in ../Genomes/genome.fa/chrsome_len_1.txt.

Using an effective genome length of ~2432 * 1e6

This is the results of "abnormal" samples. Parsing ChIP file(s): ./bedpe/d332Ip.bedpe Total eligible paired end reads: 15613622

Valid ChIP reads: 15115491 (15613622 before out of bounds removal)

Score threshold: 25.125

Number of tags in a window: 3

Number of islands found: 32601

Parsing Input file(s): ./bedpe/d332In.bedpe Total eligible paired end reads: 14749377

Valid Background reads: 14727361 (14749377 before out of bounds removal)

Found a median fragment size of 1212731.0

Using chromosome sizes found in ../Genomes/genome.fa/chrsome_len_1.txt.

Using an effective genome length of ~2432 * 1e6

endrebak commented 3 years ago

I actually do not know. I did not implement the paired end processing :/

endrebak commented 3 years ago

The fragment size is based on the 100 first lines in your files IIRC. Before any removal of potentially bad fragments.