Open jonathanBieler opened 4 years ago
@jonathanBieler, does the following cover your idea of a chromlist
method?
function Base.isless(::Int, ::Char)
return true
end
function Base.isless(::Char, ::Int)
return false
end
function seqname_isless(str1::String, str2::String) :: Bool
function parse_seqname(str::String) :: Vector{Union{Char, Int}}
arr = Vector{Char}(str)
arr = convert(Vector{Union{Char, Int}}, arr)
m = match(r"(\d+)", str)
if m !== nothing
for (capture, offset) in zip(reverse(m.captures), reverse(m.offsets))
splice!(arr, UnitRange(offset, offset + length(capture) -1), parse(Int, capture))
end
end
return arr
end
return parse_seqname(str1) < parse_seqname(str2)
end
julia> seqname_isless("chr1", "chrM")
true
julia> seqname_isless("chrM", "chr1")
false
julia> seqnames = ["chr11", "chr10", "chr01", "chr100", "chr1" , "chrM", "chr010"];
julia> sort(seqnames)
7-element Array{String,1}:
"chr01"
"chr010"
"chr1"
"chr10"
"chr100"
"chr11"
"chrM"
julia> sort(seqnames,lt=seqname_isless)
7-element Array{String,1}:
"chr01"
"chr1"
"chr10"
"chr010"
"chr11"
"chr100"
"chrM"
julia> seqnames = string.(1:11);
julia> sort(seqnames)
11-element Array{String,1}:
"1"
"10"
"11"
"2"
"3"
"4"
"5"
"6"
"7"
"8"
"9"
julia> sort(seqnames,lt=seqname_isless)
11-element Array{String,1}:
"1"
"2"
"3"
"4"
"5"
"6"
"7"
"8"
"9"
"10"
"11"
My issue isn't that the chromosome list is in a specific order or another, just that you need to know (or be able to set) the order it's stored internally in the Writer
to be able to write without getting a disordered intervals
error. So it would need to be something like I wrote above :
chromlist(writer) =
collect(values(writer.chromnames)[sortperm(collect(keys(writer.chromnames)))]
Here's an MWE :
output = open("data.bb", "w")
writer = BigBed.Writer(output, [("1", 12345), ("2", 9100), ("10", 123)])
write(writer, ("1", 101, 150, "gene 1"))
write(writer, ("2", 211, 250, "gene 2"))
write(writer, ("10", 211, 250, "gene 3"))
close(writer)
Actually it would be better if I could specify the order in the Writer
; I have a large text file I want to read line by line and write into a BigBed, but I don't want to reorder it to match the writer internal order.
Just commenting the sort here solves the problem (might create other though, I'm not sure) :
But conceptually I would leave the choice of ordering to the user (since it might be imposed by external constrains, like a file you got from another tool).
@jonathanBieler, I agree that this is an issue/annoyance and have made a note of it in https://github.com/BioJulia/BigBed.jl/projects/1.
I'm trying to write data to a BigBed file on each human chromosome, I defined my writer as such :
With
chrs
being["1", "2", ...]
. Then I'm looping onchrs
and do some write operations, but I'm getting a :ArgumentError: disordered intervals
Because in the writer the chromosomes are getting reordered as :
["1", "10", "11"]
.I managed to get around it by getting the chromosomes in the right order with:
Would it be possible to have the writer keep the ordering it's given ? (maybe using an OrderedDict) or is there a good reason why it gets reordered ?
Alternatively a
chromlist
method to get the chromosome in the right order would help.