LSchwerdt / SimultaneousSortperm.jl

MIT License
7 stars 1 forks source link

partial sort functionality? #1

Open jacob-roth opened 7 months ago

jacob-roth commented 7 months ago

I think this project is really cool! I have an application where I need to perform partial sorting (per here and here) and was trying to adapt your code to handle this case. Would you have any suggestions on where to start? I naively tried to modify ssortperm!! to accept lo and hi based on the desired partial sort indices, e.g.,

function partialssortperm!!(ix::AbstractVector{Int}, v::AbstractVector, idxrange::UnitRange{Int}; lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward)
    axes(ix) == axes(v) || throw(ArgumentError("index array must have the same size/axes as the source array, $(axes(ix)) != $(axes(v))"))
    ix .= LinearIndices(v)
    vs = StructArray{Tuple{eltype(v),eltype(ix)}}(val=v, ix=ix)
    o = ord(lt, by, rev ? true : nothing, order)
    lo = idxrange[1] # <- new
    hi = idxrange[end] # <- new
    _sortperm_inplace_small_optimization!(ix, v, vs, lo, hi, o::Ordering)
    ix
end

but I failed to get a correct implementation.

Do you think there's an easy fix? Or do you think it involve digging into the internals of _sortperm_inplace_small_optimization!?

Thanks! Jake

adienes commented 5 months ago

I would also be interested in a (very fast) partialsortperm

LSchwerdt commented 5 months ago

Sorry for the late answer. Due to a new job and a new baby I have other priorities right now.

The basic inner function _sortperm!!(ix, v, vs, lo, hi, o::Ordering, offsets_l, offsets_r) already accepts lo and hi indices for partial sorting: https://github.com/LSchwerdt/SimultaneousSortperm.jl/blob/538f666a5d896f6adfd9f4797d0cad066d460e3c/src/SimultaneousSortperm.jl#L104

I think the easiest option would be trying to call that from your function at first. This way, you could get the core functionality working and skip dealing with all the code that deals with optimizations for special types.

adienes commented 5 months ago

fantastic, thanks for the pointer. congratulations on the baby and job!

jacob-roth commented 4 months ago

Thanks, and congrats!! Is there an easy fix for my below M(not)WE for attempting to sort the first 10 elements of a vector containing 100 elements? Not sure how to use offsets in the following (or if I'm making another mistake as well):

julia> using SimultaneousSortperm
julia> n=100
julia> k=10
julia> x=randn(n);
julia> xsort=similar(x);
julia> ix=collect(1:n);
julia> SimultaneousSortperm._sortperm!!(ix, x, xsort, 1, k, SimultaneousSortperm.Forward, [1], [n])
ERROR: BoundsError: attempt to access Float64 at index [2]
jacob-roth commented 3 months ago

Hi @adienes, have you managed to use the SimultaneousSortperm._sortperm!! function in a MWE like what I tried above? I would like to obtain a partial sort indices corresponding to the smallest 5 elements of a 10 element array. I've tried the following for hi=5 which doesn't seem to do what I want (but instead sorts the first 5 elements of the original array):

Random.seed!(12345)
n = 10
v0 = randn(n)
ix0 = collect(1:n)
vs0 = StructArray{Tuple{eltype(v),eltype(ix)}}(val=v0, ix=ix0)

lo = 1
hi = 5
o = SimultaneousSortperm.Forward

v = zeros(n)
ix = zeros(Int, n)
v .= v0
ix .= ix0
vs = StructArray{Tuple{eltype(v),eltype(ix)}}(val=v, ix=ix)

offsets_l = zeros(Int, hi - lo + 1) # Adjust the size as needed
offsets_r = zeros(Int, hi - lo + 1) # Adjust the size as needed

SimultaneousSortperm._sortperm!!(ix, v, vs, lo, hi, o, offsets_l, offsets_r)

julia> hcat(vs, vs0)
10×2 StructArray(::Matrix{Float64}, ::Matrix{Int64}) with eltype Tuple{Float64, Int64}:
 (-0.308742, 1)  (-0.308742, 1)
 (-0.106916, 5)  (0.122478, 2)
 (0.122478, 2)   (1.19057, 3)
 (0.133261, 4)   (0.133261, 4)
 (1.19057, 3)    (-0.106916, 5)
 (0.724831, 6)   (0.724831, 6)
 (3.23096, 7)    (3.23096, 7)
 (-0.204223, 8)  (-0.204223, 8)
 (0.244815, 9)   (0.244815, 9)
 (-0.67478, 10)  (-0.67478, 10)

but rerunning with hi=10 provides a check that the calling sequence isn't totally wrong since it performs a full sort here:

lo = 1
hi = 10
o = SimultaneousSortperm.Forward

v = zeros(n)
ix = zeros(Int, n)
v .= v0
ix .= ix0
vs = StructArray{Tuple{eltype(v),eltype(ix)}}(val=v, ix=ix)

offsets_l = zeros(Int, hi - lo + 1) # Adjust the size as needed
offsets_r = zeros(Int, hi - lo + 1) # Adjust the size as needed

SimultaneousSortperm._sortperm!!(ix, v, vs, lo, hi, o, offsets_l, offsets_r)

julia> hcat(vs, vs0)
10×2 StructArray(::Matrix{Float64}, ::Matrix{Int64}) with eltype Tuple{Float64, Int64}:
 (-0.67478, 10)  (-0.308742, 1)
 (-0.308742, 1)  (0.122478, 2)
 (-0.204223, 8)  (1.19057, 3)
 (-0.106916, 5)  (0.133261, 4)
 (0.122478, 2)   (-0.106916, 5)
 (0.133261, 4)   (0.724831, 6)
 (0.244815, 9)   (3.23096, 7)
 (0.724831, 6)   (-0.204223, 8)
 (1.19057, 3)    (0.244815, 9)
 (3.23096, 7)    (-0.67478, 10)
adienes commented 3 months ago

for hi=5 which doesn't seem to do what I want (but instead sorts the first 5 elements of the original array):

yes, I ran into the same conclusion; that it sorts slices of the input rather than a partialsort. I was not able to put in much time to try to find alternative approaches

jacob-roth commented 3 months ago

The basic inner function _sortperm!!(ix, v, vs, lo, hi, o::Ordering, offsets_l, offsets_r) already accepts lo and hi indices for partial sorting: https://github.com/LSchwerdt/SimultaneousSortperm.jl/blob/538f666a5d896f6adfd9f4797d0cad066d460e3c/src/SimultaneousSortperm.jl#L104

@LSchwerdt: Would you have a MWE on hand (or easily accessible) for demonstrating how to ensure at least $k$ elements of an $n$-array are sorted ($k < n$)?