Open piotrfila opened 11 months ago
So I gave a bash at actually using this crate from a no_std
binary. You quickly realise that it still tries linking in std
. After a bit of fiddling with the dependencies I got all the std
features removed. Required a bunch of default-features = false
:
+num-complex = { version = "0.4", default-features = false }
+num-traits = { version = "0.2", default-features = false }
+num-integer = { version = "^0.1.40", default-features = false }
+strength_reduce = { version = "0.2.4", default-features = false }
+transpose = { version = "0.2", default-features = false }
+primal-check = { path = "/home/walter/Development/primal/primal-check", default-features = false }
And note local version of primal-check
. Don't see why the upstream version won't accept my changes for no_std
support.
Once you weed out all the std
stuff you end up with errors regarding math operations. Core variants of f32
and f64
don't implement sqrt, cos and the like. You need to go manually change them to use libm
. Silly that the rust compiler just happily ignores that when some of your dependencies have std
linked in.
I eventually got something that seems like it's building correctly on my machine. I'm not really very familiar with no_std
stuff but AFAICT it actually works! Real test would be to use it on a real embedded system. Anyway, this compiles runs:
#![feature(
lang_items,
start,
core_intrinsics,
rustc_private,
panic_unwind,
rustc_attrs
)]
#![allow(internal_features)]
#![no_std]
extern crate unwind;
use core::alloc::Layout;
use core::cell::UnsafeCell;
use core::panic::PanicInfo;
use core::ptr::null_mut;
use core::sync::atomic::AtomicUsize;
use core::sync::atomic::Ordering::SeqCst;
use core::{alloc::GlobalAlloc, intrinsics};
use rustfft::{num_complex::Complex, FftPlanner};
const ARENA_SIZE: usize = 128 * 1024;
const MAX_SUPPORTED_ALIGN: usize = 4096;
#[repr(C, align(4096))] // 4096 == MAX_SUPPORTED_ALIGN
struct SimpleAllocator {
arena: UnsafeCell<[u8; ARENA_SIZE]>,
remaining: AtomicUsize, // we allocate from the top, counting down
}
#[global_allocator]
static ALLOCATOR: SimpleAllocator = SimpleAllocator {
arena: UnsafeCell::new([0x55; ARENA_SIZE]),
remaining: AtomicUsize::new(ARENA_SIZE),
};
unsafe impl Sync for SimpleAllocator {}
unsafe impl GlobalAlloc for SimpleAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
let size = layout.size();
let align = layout.align();
// `Layout` contract forbids making a `Layout` with align=0, or align not power of 2.
// So we can safely use a mask to ensure alignment without worrying about UB.
let align_mask_to_round_down = !(align - 1);
if align > MAX_SUPPORTED_ALIGN {
return null_mut();
}
let mut allocated = 0;
if self
.remaining
.fetch_update(SeqCst, SeqCst, |mut remaining| {
if size > remaining {
return None;
}
remaining -= size;
remaining &= align_mask_to_round_down;
allocated = remaining;
Some(remaining)
})
.is_err()
{
return null_mut();
};
self.arena.get().cast::<u8>().add(allocated)
}
unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) {}
}
#[start]
fn main(_argc: isize, _argv: *const *const u8) -> isize {
let mut planner = FftPlanner::new();
let fft = planner.plan_fft_forward(1234);
let mut buffer = [Complex {
re: 0.0f32,
im: 0.0f32,
}; 1234];
fft.process(&mut buffer);
0
}
#[lang = "eh_personality"]
fn rust_eh_personality() {}
#[panic_handler]
fn panic_handler(_info: &PanicInfo) -> ! {
intrinsics::abort()
}
So to actually get no_std
support we need to:
primal-check
to add no_std
support.default-features = false
. Not sure if there's advantage on doing this only on no_std
feature or always... Could be performance regression on core
support.libm
support. Same concern around using it per default or only when we remove std
feature. I strongly suspect that'll have performance regression though.no_std
environment (with someone more familiar with no_std
).Thanks a lot! The crate now compiles on thumbv7em-none-eabihf
with the libm feature enabled. I decided to move libm into its own feature because as you said it is likely to cause a performance regression. For now, I forked primal to make it work with no-std, but it would be nice to merge the changes there upstream.
Nice work! :)
Think your primal-check
change may be incorrect:
error[E0599]: no method named `powf` found for type `f64` in the current scope
--> /home/walter/.cargo/git/checkouts/primal-4a737820cfafafb5/53d0fdf/src/perfect_power.rs:47:25
|
47 | let factor = x_.powf(1.0/expn as f64).round() as u64;
| ^^^^ method not found in `f64`
I think you need to ensure either std
or libm
is enabled. Or explicitly enable the libm
feature in RustFFT
.
My primal-check fix was just meant to get the crate to compile on no-std, so yeah it's not great :upside_down_face:. I see now that this is quite a bit more work than I originally anticipated... I am going to keep working on this though, it has been a fun project so far.
I have an example working a stm32f411 nucleo board. The library is really big though, it takes up 220KiB of flash (which takes several seconds to upload to the chip) and computing an FFT of size 1234 needs 48KiB of heap.
EDIT: Example is now in the repo.
Constructing an FFT dynamically might not be the best approach on embedded. It would be much better to construct the FFTs at compile time and only include the relevant algorithms in the binary (though this would make it impossible to construct dynamically-sized FFTs, so there would need to be a way to do both). Giving the option to store the twiddles in flash rather than sram would be nice, too.
Statically-sized FFTs would also be a nice feature for std applications, as this would likely reduce binary size and could be faster by not requiring allocations. I'm not sure how to implement this, though. The use of nightly features would probably be required, complicated const functions are hard to do in stable rust in my experience.
Another issue I found when compiling for thumbv6m-none-eabi
(pi pico board, Arm M0+) is that not all targets support Arc
. This can be mitigated by using portable-atomic, though I could not get an example to compile without modifying portable-atomic-util (it does not re-export the critical-section feature of portable-atomic).
This PR replaces all std imports in the scalar implementation with analogues from core and alloc and makes std an optional feature enabled by default. HashMap implementation is taken from hashbrown (which std uses internally). Relevant issues: #116, #122. I ran the benchmarks overnight and although the full suite takes longer to perform (on my x86-64 Linux machine: median 869 s vs 767 s before), the individual test results are somewhat more balanced (68 tests run more than 20% slower and 67 tests more than 20% faster).
Here are the test results. The first number is the median time taken by the new implementation divided by the median time taken by the old implementation. (Each version was run 15 times). I am not sure where the performance difference comes from.