ChillFish8 / cfavml

An unopinionated SIMD vector operation library for Rust, supporting no_std and no-alloc workloads.
Apache License 2.0
3 stars 1 forks source link

hypot function in cfavml #14

Open skewballfox opened 4 days ago

skewballfox commented 4 days ago

mentioned in #13 this would be a useful operation to have for (float) operations in cfavml itself. for subnormal or large numbers, computing it directly will lead to underflow or overflow. I'm in the process of adapting the implementation laid out here. link to copy on rust playground, the versions worth looking at are lines 73 and 87.

The tldr is to get the abs max of the inputs (hi), factor out the nearest 2 power < hi, and scale the subnormal inputs. Working on testing for subnormal inputs because I'm still trying to wrap my head around the scaling part.

I don't think this should be part of simd register, partially because this only make sense for float types, but it can't be handled separately the way cosine is. Should we create a separate trait for common floating point ops?

ChillFish8 commented 4 days ago

I think floating point ops as a separate thing makes sense.

Honestly, one of my regrets about the 1 SimdRegister trait is that it makes creating these distinctions very difficult. It could be reworked to allow more complex behaviour, like accumulating into large bit-size integers. For example, the Int8 dot product realistically wants to accumulate into an f32.

I think A good starting point would be to create a separate trait for the op, and we take an approach of each op being a trait and gradually move the current SimdRegister ops that aren't the basic loads, stores, etc... Into separate traits so we can improve the flexibility and remove ops for data types that don't make sense to implement.

skewballfox commented 1 day ago

so I can close the tab, something that might be useful for automath is the paper that the cmath implementation is based on, though I don't think it's as useful for the simd versions. Mainly because it involves a lot of conditional branching, which I'm guessing would mean masking and executing for every condition.