BioJulia / GenomicFeatures.jl

Tools for genomic features in Julia.
Other
32 stars 13 forks source link

filter function for IntervalCollection #35

Closed hsugawa8651 closed 4 years ago

hsugawa8651 commented 4 years ago

Types of changes

:clipboard: Additional detail

filter(f, coll:IntervalCollection) is an eager implementation of filtering IntervalCollection,
which is equivalent to the following code:

result=IntervalCollection{T}()   # T is the type of metadata of the element in `coll`
for i in Base.Iterators.filter(f, coll)
    push!(result,i)
end
result

This function provides a way to make a subset of IntervalCollection that satisfies a given proposition.

The following code selects non-overlapping intervals in the IntervalCollection coll

coll=IntervalCollection{Nothing}()
push!(coll, Interval("chr1", 1, 9))
push!(coll, Interval("chr1", 4, 7))
cov=coverage(coll)
cov1=filter(cov) do i
    i.metadata==1  # specify coverage
end
@show cov1
# =>
# IntervalCollection{UInt32} with 2 intervals:
#  chr1:1-3  .  1
#  chr1:8-9  .  1

:ballot_box_with_check: Checklist

Note 1

I am afraid that this function is already defined somewhere in the huge BioJulia libraries. In that case, I will take down this PR.

Note 2

I fixed the URL links for Julia style guide, and documentation style guide in the PR template, which are broken.

CiaranOMara commented 4 years ago

This PR is tidy, I appreciate the documentation here, and in the code you produced.

The filter method is defined in base.

In terms of filtering Intervals, there are a couple of other options that have the benefits of bulk insertion.


intervals = [
    Interval("chr1", 1, 9),
    Interval("chr1", 4, 7)
]

cov=coverage(intervals) # Note: sorted intervals.

predicate(i) = metadata(i) == 1

selected = IntervalCollection(Base.Iterators.filter(predicate, col))
selected = IntervalCollection([x for x in cov if predicate(x)])

selected = Base.Iterators.filter(cov) do i
    metadata(i) == 1
end |> IntervalCollection
hsugawa8651 commented 4 years ago

Thank you for your comment. Your solution using Base.Iterators.filter is much better than my PR, a sort of reinventing the wheel. I will take down this PR, though adding your codes to the document of coverage function should be useful for users of this package.