deeptools / pyBigWig

A python extension for quick access to bigWig and bigBed files
MIT License
212 stars 48 forks source link

Can there be overlapping intervals in bigWig files? #93

Closed burcakotlu closed 5 years ago

burcakotlu commented 5 years ago

Assume that there are overlapping intervals BigWig file. When we run bw.values(chrName,start,end), does your package considers overlapping intervals?

e.g.: we have data something like this (this is bed format though, assume we have its wig version at hand) chr1 100 110 2 chr1 105 115 3

Does bw.values('chr1',100,115) provides [2,2,2,2,2,5,5,5,5,5,3,3,3,3,3] ?

Thanks

dpryan79 commented 5 years ago

The bigWig format does not support overlapping intervals.

Sent from my iPhone

On 4. Sep 2019, at 20:33, Burçak Otlu notifications@github.com wrote:

Assume that there are overlapping intervals BigWig file. When we run bw.values(chrName,start,end), does your package considers overlapping intervals?

e.g.: we have data something like this (this is bed format though, assume we have its wig version at hand) chr1 100 110 2 chr1 105 115 3

Does bw.values('chr1',100,115) provides [2,2,2,2,2,5,5,5,5,5,3,3,3,3,3] ?

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

burcakotlu commented 5 years ago

Thanks.

What is your answer for BigBed files? If we have BigBed files with overlapping intervals as above, can we get values as [2,2,2,2,2,5,5,5,5,5,3,3,3,3,3]?

Or since there is no values() provided for BigBed files, we need to go over for each interval in the list returned by entries() by ourselves?

dpryan79 commented 5 years ago

For bigBed files you get the list of overlapping entries and need to then pull out the value (if there is one). Overlapping intervals are very much allowed in that case.

burcakotlu commented 5 years ago

Thank you very much for your former answers.

I have one more question: I downloaded ENCFF396NIV.bigBed from https://www.encodeproject.org/experiments/ENCSR000DWB/

$bb=pyBigWig.open("path/to/ENCFF396NIV.bigBed")

$bb.header() {'version': 4, 'nLevels': 10, 'nBasesCovered': 38456315, 'minVal': 1, 'maxVal': 1, 'sumData': 38456315, 'sumSquared': 38456315}

$bb.entries('chr1',20000000, 20121100) [(20090555, 20090686, 'Peak_168419\t30\t.\t2.59056\t2.72397\t0.71189\t92'), (20099751, 20099871, 'Peak_157332\t42\t.\t2.98679\t3.17766\t1.05510\t91')]

Although I did not the all scores, just these two intervals have score of 30 and 42, how do we get maxVal as 1? From which field do you get value? Thanks

dpryan79 commented 5 years ago

The max and min values are stored in the headers and are probably meaningless for bigBed files.

burcakotlu commented 5 years ago

So, I need to find max and min from score field of BigBed entries. And for BigWig files, I did not check but the min and max from header() call for bigWig was reasonable. I guess, they are correct for them.

Thanks.

dpryan79 commented 5 years ago

Yeah, for bigWig files the values really should be correct.