deeptools / pyBigWig

A python extension for quick access to bigWig and bigBed files
MIT License
210 stars 48 forks source link

BigBed interval problem #122

Open jtd032 opened 2 years ago

jtd032 commented 2 years ago

I am creating a list of histograms, one for each file using below code:

Imports

import numpy as np import pyBigWig as bw import matplotlib.pyplot as plt import os

For Loop

directory = 'listed file path' for filename in os.listdir(directory): f = os.path.join(directory, filename) if os.path.isfile(f) and filename.endswith('.bb'): fp = bw.open(f,'r') chr = filename.replace('.bb','') max = fp.header()['maxVal']

print(fp.header())

    a = np.array(fp.entries(chr, 1, max),dtype=np.int64)
    plt.hist(a[:,2], bins='auto')  # arguments are passed to np.histogram
    plt.title("Histogram with 'auto' bins")
    #Text(0.5, 1.0, "Histogram with 'auto' bins")
    print(chr)
    plt.show()

The problem I am riunning into is retreval of the maxVal from the Header command, it works for the first few graphs but ends up spitting out an error at later files: (int() argument must be a string, a bytes-like object or a number, not 'NoneType') am I understanding that the maxVal is the top end of the range of values for that file?

dpryan79 commented 2 years ago

The maxVal is stored in the bigBed header. Could it be that it simply wasn't set for one of the files?

jtd032 commented 2 years ago

all files pull up a maxVal when tested chr10 was successful but chr11 was not: image error msg: image

dpryan79 commented 2 years ago

Can you make the file available to me? I can have a look then.

YunfengLUMC commented 2 years ago

Hi, Currently, I'd like to know how to save the all entries into a file. Here is my code: bb=pyBigWig.open('./PBMCs_HistoneMarks_Blueprint/Males_UMCG00025_H3K4me1.peak_calls.bigBed' ) bb.entries('chrX', 16426, 156000962, withString=False) So how can I output "bb.entries" object? By the way,for the bigBed object, how can I output all chromosomes intervals at one time, I found I need to specify start and end positions for each chromosome. Again,if I use bigWig file, the intervals I extract is same as bigBed?Because I found start and end position is not necessary for bigWig file based on your description. Many thx!

dpryan79 commented 2 years ago

I don't know that I ever put in the logic in the .entries() function to have it fill in the chromosome bounds if nothing was supplied. I suppose that could be done, though since the python function is really just a thin wrapper over a C function and C is less flexible about such things.

For outputting the results of bb.entries(), it's just a list of tuples, so something like the following would work:

for res in bb.entries('chr1', 10000000, 10020000):
    o.write("chr1\t{}\t{}\t{}\n".format(res[0], res[1], res[2]))
YunfengLUMC commented 2 years ago

I don't know that I ever put in the logic in the .entries() function to have it fill in the chromosome bounds if nothing was supplied. I suppose that could be done, though since the python function is really just a thin wrapper over a C function and C is less flexible about such things.

Thanks for your detailed reply. Could I try this, I don't need strings: for res in bb.entries('chr1', 10000000, 10020000, withString=False): o.write("chr1".format(res[0], res[1], res[2])) Best wishes!

dpryan79 commented 2 years ago

o.write("chr1\t{}\t{}\n".format(res[0], res[1])) in that case as an example.