Closed igordot closed 8 years ago
why don't you simply turn your BED file into a bigWig file?
cut
or awk
, or simply in Excel]bedGraphToBigWig
BED files don't lend themselves to random querying, which computeMatrix needs (there are ways around this, but getting everyone to tabix index their files is probably a non-starter). If, for some reason, you don't want to use bedGraphToBigWig
, I can show you a few lines of python that will perform the conversion from the BED file (you already have the prerequisite python modules installed, since deepTools uses them as well).
I could convert to bedGraph and then bigWig, but that's two extra steps. Yes, they are simple, but I was hoping there would is a more elegant solution possible.
Regarding random querying of BED files, is it that big of an issue? Usually BED files are relatively small and can be easily loaded into memory.
Essentially all parts of deepTools rely on random querying of files to work, so getting around that would require a fair bit of effort (and the accompanying maintenance overhead). Having said that, we could presumably use the deeptoolsintervals module to read the whole file in and allow random querying (I think I'm storing the score already). I already added a special "remote wig/bedGraph files on deepBlue" method in version 2.4, so I suppose BED would be doable too. I'll think about this more tomorrow.
Thank you for the prompt feedback. I tried converting BEDs to bigWig. After I added --missingDataAsZero
to the computeMatrix step, the resulting TSS profile plots for full genome bigWigs and BED-based bigWigs look very similar.
Although the input bigWigs are now much smaller, the processing time is not much quicker. I guess most of the computation happens at a different stage.
Glad that worked. For what it's worth, the time needed by computeMatrix is a function of the genome size. There's not much of a speed benefit from having only less data.
Thanks for the clarification!
I just sat down to play around with implementing this and realized that there's no good way to write a "give me a list of chromosomes and their lengths" method. That's a deal breaker given how the rest of deepTools works internally. At the moment this will be classified as "won't implement", though if I come up with an elegant way to incorporate it in the future I will.
I suppose you could use the BED file to get the list of chromosomes (uniques from col1) and lengths (max of col3), but that's a big approximation, so it makes sense not to do that.
Is it possible to use computeMatrix with scores (--scoreFileName) as BED instead of bigWig file? For example, if I want to see how my ChIP-seq peaks are distributed around TSS. It should make the calculation a lot quicker.