Emerentius / ord_subset

Tools for working with types where a subset of values has a total order, like e.g. floats without NaN
Apache License 2.0
12 stars 1 forks source link

Implement ord_subset_sort_by_cached_key #6

Open dvtomas opened 2 years ago

dvtomas commented 2 years ago

I'm missing this, it would also be nice to have it to implement it to have a complete ord_subset mirror of the slice API

dvtomas commented 2 years ago

I compiled the following from ord_subset_sort_by_key and Vec::sort_by_cached_key. It might work as a jumping board.

trait OrdSubsetSliceExt<T> {
    fn ord_subset_sort_by_cached_key<K, F>(&mut self, f: F)
        where
            Self: AsMut<[T]>,
            K: OrdSubset,
            F: FnMut(&T) -> K;
}

impl<T, U> OrdSubsetSliceExt<T> for U
    where
        U: AsRef<[T]>,
{
    fn ord_subset_sort_by_cached_key<K, F>(&mut self, f: F)
        where
            U: AsMut<[T]>,
            K: OrdSubset,
            F: FnMut(&T) -> K,
    {
        use ord_subset::OrdSubsetSliceExt;

        {
            // Helper macro for indexing our vector by the smallest possible type, to reduce allocation.
            macro_rules! sort_by_key {
            ($t:ty, $slice:ident, $f:ident) => {{
                let slice = $slice.as_mut();
                let mut indices: Vec<_> =
                    slice.iter().map($f).enumerate().map(|(i, k)| (k, i as $t)).collect();
                // The elements of `indices` are unique, as they are indexed, so any sort will be
                // stable with respect to the original slice. We use `sort_unstable` here because
                // it requires less memory allocation.
                indices.ord_subset_sort_unstable();
                for i in 0..slice.len() {
                    let mut index = indices[i].1;
                    while (index as usize) < i {
                        index = indices[index as usize].1;
                    }
                    indices[i].1 = index;
                    slice.swap(i, index as usize);
                }
            }};
        }

            let sz_u8 = mem::size_of::<(K, u8)>();
            let sz_u16 = mem::size_of::<(K, u16)>();
            let sz_u32 = mem::size_of::<(K, u32)>();
            let sz_usize = mem::size_of::<(K, usize)>();

            let len = self.as_ref().len();
            if len < 2 {
                return;
            }
            if sz_u8 < sz_u16 && len <= (u8::MAX as usize) {
                return sort_by_key!(u8, self, f);
            }
            if sz_u16 < sz_u32 && len <= (u16::MAX as usize) {
                return sort_by_key!(u16, self, f);
            }
            if sz_u32 < sz_usize && len <= (u32::MAX as usize) {
                return sort_by_key!(u32, self, f);
            }
            sort_by_key!(usize, self, f)
        }
    }
}
Emerentius commented 2 years ago

I don't reimplement any sorting algorithms as that is best left to the stdlib authors. One can call the Ord-requiring sorts by constructing the right comparison functions or wrappers.

7 contains a first implementation.

As the trait isn't sealed (i.e. other people can implement it for their types), this is technically a breaking change and would require a major version change. Given that I'm exporting traits here that are also part of the public API of upstream crates, that would be a lot of breakage for a relatively minor crate such as this.

dvtomas commented 2 years ago

Yeah, I like your implementation a lot better. And yes, this technically sounds like a change requiring a major version bump. In practice, searching for ord_subset_sort_by_cached_key on github yields only this issue/PR, nothing on GitLab or Code Search, and the fact that this issue hasn't been raised before also tells something. There's a good chance this will affect no-one, or perhaps just very few people with private projects. Still, the decision is on you, I don't really have the knowledge to suggest anything.

Emerentius commented 2 years ago

A duplicate method name isn't considered breaking or you couldn't add new methods ever. The issue is adding new required methods to a public trait that someone else may have implemented for their own type. With the new method, their implementations would be incomplete and cease to compile.

Really, nobody ought to have implemented it, but it is possible. I should have sealed the trait, but I didn't know about the technique back then.