BioJulia / IntervalTrees.jl

A data structure for efficient manipulation of sets of intervals
MIT License
44 stars 17 forks source link

find nearest interval #14

Open simon-anders opened 8 years ago

simon-anders commented 8 years ago

Hi

Would it be difficult to the following functionality? Given an IntervalTree{K,V} and a position x (i.e., a value x of type K), return iterators generating

  1. the intervals following x, sorted by increasing start position (more formally: all intervals iv with first(iv) > x, sorted by first(iv))

    and

  2. the intervals preceding x, sorted by decreasing end position (more formally: all intervals iv with last(iv) < x, reverse-sorted by last(iv).

A typical use case for this would be the following: Let's say I want to use IntervalTrees or an IntervalCollection to represent gene annotation: For each transcript (or exon) listed in a GTF or similar annotation file, I store an interval in the collection. Now, let's say, I have ChIP-Seq data for a transcription factor, have identified putative transcription factor binding sites with some peak finding tool, and now would like to know for each peak the closest gene or transcript to which the peak is upstream. For this, I would need to inspect the intervals to the left and to the right of the peak's position and chose the one that is closest and has the right strand (here: reading direction pointing away from the peak).

Thanks a lot for working on an interval tree class; this will be a vital component to make Julia useful for genomics. The feature I suggest here is what I think is the only thing still missing.

My guess is that task 1 (searching to the right) is easy by just descending the tree to the position and then walking depth-first. For task 2 (searching to the left), one may need to have to inspect several tree branches in parallel, depending on the maximum end positions in the inner nodes, right?

Thanks

Simon