ZigEmbeddedGroup / microzig

Unified abstraction layer and HAL for several microcontrollers
zlib License
1.25k stars 102 forks source link

HAL Design #208

Open ikskuh opened 4 months ago

ikskuh commented 4 months ago

I created a Google Doc which contains some overview of what features different MCUs have.

Those who want to join can send me a message with their email on discord, i'll invite you.

Some words on the future path:

  1. Figure out what features we require for several chips
  2. Determine which subset of those features we want to have in a shared HAL
  3. Design the API for those features
  4. Implement them in the boards, use common driver abstractions
haydenridd commented 4 months ago

Alright! So, apologies in advance for the dissertation on HALs but... I've been thinking about HAL's for a very long time in my embedded career. I've used a bunch of them, seen even more, and even wrote my own internal HALs for projects. I also feel like I have what may be a somewhat controversial opinion, so I want to make sure to fully articulate my opinion and my reasoning behind it :)

High Level Summary

Problem Statement

Now, let me make a bold statement: Universal HALs attempt to solve a problem that doesn't actually exist.

"But", you say, "there's plenty of good reasons to want to compile my firmware for multiple targets effortlessly!" And you would be absolutely right! The issue with a universal HAL is it isn't just trying to let you use the same API for two or three targets, it's trying to let you use it for all targets. Think about how often you truly need to compile your firmware for 2 different MCUs. I can think of some times in my career, usually due to supplier shortages, or experiments. Now think about how often you compile your firmware for 5+ different MCUs... Okay I can maybe think of some hypothetical cases but they're starting to get incredibly niche. Okay now 20+... You see where I'm going. With how different peripherals, pin counts, and feature sets can be MCU to MCU, the amount of application logic you can "share" between chips starts to drop off rapidly.

But if a universal HAL is feasible, what's the big deal? Who cares if you don't need to compile for every supported chip, isn't it nice to have a standard API anyways? Well, that brings me to...

The Cost of Abstraction

I love abstraction. I use it constantly, we all do. It's why we write in C instead of assembly, or C++ instead of C, or MicroPython instead of C++. Each level of abstraction brings us away from processor instructions and memory mapped IO, and towards more digestible concepts like GPIOs being "on" or "off", and reading a voltage in millivolts from an ADC as opposed to a 16 bit signed integer.

I like to think of each layer of abstraction as a contract. Using C, if you follow the rules regarding the language and the compiler settings, the contract says that it will produce valid assembly instructions for your processor of choice. Zig + LLVM can be thought of as having a similar contract for compiling code for a given CPU. So what contract does a HAL provide? Well, generally speaking the contract is: "You may not have the flexibility and customization that comes with flipping bits in registers, but if you use this your code can convey behavior rather than setting bits in registers".

And this is a nice trade, with the ever growing complexity of MCUs, it would be a lot to ask someone to start every project from scratch, transcribing all the peripheral addresses into code. But what happens when the contract doesn't hold? Well, you essentially forfeit that layer of abstraction, and have to learn the layer N-1 down. Here's an example function from an STM32 HAL:

void HAL_GPIO_WritePin(GPIO_TypeDef* GPIOx, uint16_t GPIO_Pin, GPIO_PinState PinState)
{
  /* Check the parameters */
  assert_param(IS_GPIO_PIN(GPIO_Pin));
  assert_param(IS_GPIO_PIN_ACTION(PinState));

  if(PinState != GPIO_PIN_RESET)
  {
    GPIOx->BSRR = GPIO_Pin;
  }
  else
  {
    GPIOx->BSRR = (uint32_t)GPIO_Pin << 16;
  }
}

Not too bad, right? There's some stuff we don't know off hand but it's relatively clear we're setting some register we can probably look up in the datasheet by name. If this fails, this isn't too awful a level to have to figure out.

Now let's look at the same example from Zephyr (taken directly from the example application):

ret = blink_off(blink);
if (ret < 0) {
    LOG_ERR("Could not turn off LED (%d)", ret);
    return 0;
}

What do we have to look at if this call fails? Well, let's see:

Def of blink_off:

static inline int blink_off(const struct device *dev)
{
    return blink_set_period_ms(dev, 0);
}

Def of blink_set_period:

__syscall int blink_set_period_ms(const struct device *dev,
                  unsigned int period_ms);

Huh.... okay, well there's a comment here that might shed some light:

/**
 * @defgroup drivers_blink_api Blink driver API
 * @{
 *
 * @brief Public API provided by the blink driver class.
 *
 * The public API is the interface that is used by applications to interact with
 * devices that implement the blink driver class. If support for system calls is
 * needed, functions accessing device fields need to be tagged with `__syscall`
 * and provide an implementation that follows the `z_impl_${function_name}`
 * naming scheme.
 */

Ah! Okay here's z_impl_blink_set_period_ms:

static inline int z_impl_blink_set_period_ms(const struct device *dev,
                         unsigned int period_ms)
{
    const struct blink_driver_api *api =
        (const struct blink_driver_api *)dev->api;

    return api->set_period_ms(dev, period_ms);
}

Whoa... What is a blink_driver_api??

/** @brief Blink driver class operations */
__subsystem struct blink_driver_api {
    /**
     * @brief Configure the LED blink period.
     *
     * @param dev Blink device instance.
     * @param period_ms Period of the LED blink in milliseconds, 0 to
     * disable blinking.
     *
     * @retval 0 if successful.
     * @retval -EINVAL if @p period_ms can not be set.
     * @retval -errno Other negative errno code on failure.
     */
    int (*set_period_ms)(const struct device *dev, unsigned int period_ms);
};

I'm going to stop here and leave tracing down to actual registers sets as an exercise for the reader as I think I've made my point. The more devices you support, the more expansive your API grows to accommodate all the differences between devices. All of the sudden there are over 4 function call-sites and what appears to be an abstract interface/implementation pattern required to set a GPIO pin. Debugging these kind of things when they fail is a nightmare, and they will fail eventually, it just comes with the territory.

So get to your point already!

So if a universal HAL isn't realistic, what is? My proposal follows these tenets:

Appropriately Abstractable

Determining what is "appropriate" is unfortunately subjective, but like pornography versus art, "you'll know it when you see it". I think a good starting place for this is "families" of chips. For instance all STM32F7xx chips are relatively similar, and contain similar peripheral sets. They all have the same cortex-m7 as their core. However, trying to group in STM32L0xx chips would likely not be appropriate as they're low power chips that serve a very different use case with a cortex-m0+ MCU. Things would start to get messy quickly if we tried to squash these two very different chips into the same over-arching API.

Favoring Simplicity

Myself, more than anyone else, is guilty of this. Elegant code designs feel great to code. But they can sometimes feel horrible to use. Zephyr's use of device tree to macro conversions, abstract APIs, and other tricks all in C is impressive, and interesting to figure out. But it's a nightmare to use. Sometimes, dirt-stupid functions like SetGpio(), EnableInterruptForUart() are better than complex code architectures just to save some lines of boilerplate.

And here's the biggest benefit to simple, robust HALs: Writing your own personal "universal HAL" is WAY easier. Think about it, who better than to write a wrapper layer for the two specific MCUs you need to support in your firmware than you? Given clear, well-documented, and robust HALs, in Zig this would be as easy as a comptime statement like so:

const config = @import("config");
const GpioApi = switch(config.arch) {
  .stm32f750 => @import("stm32f750_gpio.zig"),
  .ra8d1 => @import("ra8d1_gpio.zig")
 };

Where your wrappers for the two different GPIO hals reside in the imported .zig files.

Opt-In Abstraction

The embedded community has a wide range of different experience, requirements, and knowledge levels. I like to think of the journey of a modern embedded enthusiast as working their way down the abstraction tree. You might start with something like Arduino, and before you're know it you're a proper sicko who is helping write a compiler that generates more efficient assembly instructions (sound familiar??). I think it's critically important we meet people where they're at.

I see this as consisting of:

People can choose to enter at any point that they wish, and shouldn't be punished for it. It should not be hard to fire up a project just with the peripheral registers mapped and way to set/read them. Nor should it be hard to clone an example repo for the rpi pico and start twiddling LEDs immediately.

I'm occasionally disappointed when so much care is taken into building up an API for a specific development board, that resources for just controlling the MCU on its own on a custom board are nowhere to be found.

The Path Forward

For anyone who's made it here, first off congrats, and second off I acknowledge I am not at all the one making decisions in embedded Zig. This is merely my 2 cents, and I encourage any and all discord surrounding my opinion! If I were to choose, here's my lofty goals for embedded Zig:

Thank you for taking the time to listen to my thoughts/rant, and I'm looking forward to the discussion!

Hayden

DNedic commented 4 months ago

First off, I do think a HAL with an universal API is a good idea, but it should not attempt to wrap everything in my opinion. Some things can be left out of the HAL and remain port-specific in order to avoid either limiting what the user can do with microzig, or horribly hacky solutions to adapt to an overly simplified API.

To address some of the points @haydenridd made:

"But", you say, "there's plenty of good reasons to want to compile my firmware for multiple targets effortlessly!" And you would be absolutely right! The issue with a universal HAL is it isn't just trying to let you use the same API for two or three targets, it's trying to let you use it for all targets. Think about how often you truly need to compile your firmware for 2 different MCUs. I can think of some times in my career, usually due to supplier shortages, or experiments. Now think about how often you compile your firmware for 5+ different MCUs...

Hard disagree here. While a single person might not need this, it enables the ecosystem to grow as now reusable libraries can be written. One major reason Arduino took off is because of the availability of libraries, and those wouldn't be there without an universal HAL. Rust is seeing the same ecosystem bloom effect with its embedded-hal.

On the cost of abstraction: I tend to agree to an extent, this is something to be careful about and too many layers of indirection can backfire. On the other hand with LTO the runtime cost should be near 0 most of the time.

Write a HAL for each family of chips that is appropriately abstractable

This is what is being done now to a degree and I don't see much traction behind the embedded Zig ecosystem. Also it's a huge duplication of work, and a deterrent to anyone who wants to add chip support to microzig. I have been considering writing a HAL for Espressif chips however I am much less inclined if I know it's not going to be a part of a standardized API on top of which libraries, educational content etc can be written.

Let users "opt-in" to the level of abstraction they would like

A HAL is almost always opt-in, and you should always be able to use lower layers as well as use chip/vendor specific functionality on the side as well. Having a standardized API should not stop nonstandard extensions.

haydenridd commented 4 months ago

First off, I do think a HAL with an universal API is a good idea, but it should not attempt to wrap everything in my opinion. Some things can be left out of the HAL and remain port-specific in order to avoid either limiting what the user can do with microzig, or horribly hacky solutions to adapt to an overly simplified API.

I'll do a little walking back and say I agree with you here. I think there is value to a universal "simple" API, and then a per-family more specialized API. I would just emphasize they should be separate, and we shouldn't bend over backwards to try to fit everything under one "master" API.

I realize I'm coming at this from my own perspective of a professional firmware engineer, so there are absolutely biases in what I want and what is useful to me professionally. Arduino is a great example to look at. It is something that is wildly successful, great in bringing people into the community, and useless for me professionally. I can't write firmware in Arduino, it doesn't get low level enough. It's still extremely valuable to have in the ecosystem, it's just not the right tool for me. It just means there needs to be a seperate tool (C with vendor HALs) that gets low level enough.

I think my fear (again putting on my professional firmware engineer hat), is that we never get past the "basic API" level. I've seen this a lot in the Rust embedded scene. You excitedly download a HAL crate for your chip only to realize it more or less only supports some basic GPIO, blocking UART reads, etc. It's frustrating because it puts you in an odd position. You can:

I love that Rust is gaining more traction in embedded, but there is only a single company I know of using it in production for deeply embedded (freestanding) MCUs: https://github.com/oxidecomputer/hubris

And they basically wrote an OS from scratch and don't really use any of the community HALs (they do use the cortex-m related crates but no higher level abstractions than that as far as I can tell).

Maybe a better way to frame my (selfish) fear is that sometimes I think the universal, easy API gets prioritized to the point where the nitty-gritty is somewhat forgotten about. So then you have tons and tons of HALs that get you to blinky and some basic peripheral usage, and not much else. Maybe a point to take from this is I want us to be very clear in our intentions and what is supported at what time. Similar to "tierN" support targets, being transparent that:

Finally to address this point:

This is what is being done now to a degree and I don't see much traction behind the embedded Zig ecosystem. Also it's a huge duplication of work...

I hear you here. I think I envisioned any of these "family specific" HALs still falling under an "official" ZEG Github project (or potentially mono-repo). I think a single source of truth is super important, and yes, I feel the pain of wondering "is this project just going to get lost in the sands of time or actually used?". Going back to your first point, I like the idea of a "core" API for simple stuff, and "extensions" for vendor specific peripherals. But they should all be under one roof, and it should be obvious what you should use when "starting an embedded project with Zig for X". A "core" API for the simple stuff should take care of most of the duplication, once you start getting into specific peripherals I don't think there's a ton of code re-usabilty between different vendors.

I appreciate the opposing perspective :) Libraries using MCU peripherals was not something I thought too much about but would certainly be a nice addition to the ecosystem.