InfiniTimeOrg / InfiniTime

Firmware for Pinetime smartwatch written in C++ and based on FreeRTOS
GNU General Public License v3.0
2.63k stars 901 forks source link

SPI transaction hooks #2046

Closed mark9064 closed 1 month ago

mark9064 commented 3 months ago

This replaces the original technique for protecting the LCD data command pin (task notifications), which didn't work super well.

This is replaced with SPI transaction hooks, where when an SPI write request (transaction) is started, you can specify a hook that runs before the write and after it finishes. It's not a super elegant solution as the hook function has to be static (if someone with C++ knowledge wants to go nuts on that be my guest!) but it's pretty straightforward and simple at least. So for the LCD, the hook just sets the data command pin to the correct setting before the transaction starts, and does nothing on transaction end. It also allows us to remove task notifications completely which saves a bit of memory on each task.

I didn't bother hooking this up for reading as nothing needing hooks reads at the moment. But the idea would be identical to writes. Also, this PR only uses the pre-transaction action, so it could also be simplified to only support that.

As the hook function has to be static, the LCD driver is changed to reference the hardware pinmap directly. Not as nice an abstraction but we are only ever going to have one LCD so it's not too bad IMO.

Performance also seems to be better than waiting for notifications.

Closes #1984 (probably also #1022). Also fixes occasional temporary garbling of small parts of the display. Split from #1869

github-actions[bot] commented 3 months ago
Build size and comparison to main: Section Size Difference
text 376952B -544B
data 940B 0B
bss 63540B 0B
mark9064 commented 2 months ago

(InfiniSim PR now ready)

JF002 commented 2 months ago

I've never been satisfied with the synchronization mechanism in the ST7789/SPI driver but I haven't tried to find a better design/implementation since I wrote it a while ago, so I'm happy someone is actually working on it.

The current implementation based on FreeRTOS notifications reduces the coupling between the drivers and the application layer, but add a dependency to FreeRTOS in the low-level driver, which is something I would like to avoid. It also adds a bit of overhead : once the write operation (aka Transaction) is done, a task notification is pushed to FreeRTOS. This notification will be taken care of when the scheduler decides to give the display task some CPU task and the notification.

We need this synchronization mechanism because of the Data/Command pin of the ST7789 LCD driver : we cannot change its state while a transaction is ongoing. Since this pin is not part of the SPI bus, it's managed by the ST7789 driver class. The data transfer to the SPI Bus is asynchronous (thanks to DMA). This is great for performance : it's faster and it allows the CPU to do other things while the transfer is ongoing. The SPI driver uses a mutex to ensure that the previous transfer is finished before starting a new one. If the ST7789 class would simply change the state of the D/C pin before calling Spi::Write(), it could change the state of the pin while a transfer is still ongoing. That's why we wait for a notification from the SPI bus before changing the pin state and starting a new transfer.

The implementation in this PR reduces the overhead a lot : simply call a function (via a function ptr) that clears/sets the D/C pin before/after the transaction.

I totally agree with the idea, but the implementation has a few disadvantages that you've already mentioned : the callback needs to be static, so it doesn't know the ST7789 instance and the pin number that must be toggled.

Using PinMap is a nice workaround but I would rather like the low level drivers to be as self-contained as possible (that's why I think removing the dependency to FreeRTOS is a good idea).

On a side note, I'm working a new design that will allow InfiniTime to build and run on other platform and hardware (see my demo from FOSDEM) and reducing the dependence between modules and layers will definitely help for such an goal!

So how can we improve this? Maybe we can try with lambas and std::function instead of plain function pointers? std::function might add indirection and dynamic memory allocation so we have to be very careful though. But, all we want to store in the lambda/std::function is a pin number, maybe we can avoid the dynamic memory allocation, and the indirection while calling the function might still be more efficient than the notification? I guess all we have to do is test and measure :)

Performance also seems to be better than waiting for notifications.

Out of curiosity, do you have any numbers to share about this?

mark9064 commented 2 months ago

No numbers unfortunately - just based off how it seems. I suspect as going through the RTOS adds some waiting due to task switching etc. But equally this could be entirely inside my head :)

There's a lot of work to do to abstract InfiniTime over different hardware. I'd love to see more on how you are approaching hardware abstraction in the future so I can write code that fits with that as a contributor - at the moment I have no clue what plans are. I'll take some time to look through the multi target repository, I missed the mention of that at FOSDEM!

Adding code that will cause known problems later makes no sense though, so I'm totally in favour of re-abstracting this in that case.

A simple extension would be to add an extra argument to the hook/spi Write that's a user data pointer. Then the hook can cast that pointer back and execute a method after. This is the LVGL/FreeRTOS way but isn't very pretty. As you've said though there'll be a nicer C++ abstraction for this so I'll experiment a little

JF002 commented 2 months ago

No numbers unfortunately - just based off how it seems. I suspect as going through the RTOS adds some waiting due to task switching etc. But equally this could be entirely inside my head :)

That's fine. The more I work on display related stuff the more I realize that "it looks better to my eyes" is a valid metric :-)

My experiments with the multi-platform design is still early days, I'm trying multiple designs and implementations to find "the best ones". I mentioned it in the last Pine64 community update : it shows the same app running on the Pinephone, the PineTime and a BL618 (RISC-V) dev board.

20240401_184459

Basically, the idea is built around C++ concepts. The "application" (DisplayTask, SystemTask, user applications,...) is common to all platform. Anything that can change from a platform/hardware to another is abstracted by C++20 concepts. For example, there's a is_display concept that defines a Init() function and a DrawBuffer() function. That's all the application knows about a Display.

Then, each platform has to provide an implementation of a "Display" that implements the contract is_display. I currently have 2 implementations : the ST7789 and the one based on SDL for the simulator.

The ST7789 takes an object that implements the contact is_data_command_serial_bus. The actual implementation must take care of the D/C pin, the CS pin and the SPI bus. The implementation for the PineTime is a class DisplaySpi that drives the D/C pin and the SPI bus of the NRF52. The implementation for the BL618 uses the integrated "DBI" peripheral which handle everything (D/C, CS and SPI) in hardware.

If the take the example of this new hook, it definitively makes sense for the NRF implementation of the DataCommandSerialBus, but it will not be used for the BL618 implementation.

Anyway, this is still work in progress and might change in the future ;-)

A simple extension would be to add an extra argument to the hook/spi Write that's a user data pointer. Then the hook can cast that pointer back and execute a method after. This is the LVGL/FreeRTOS way but isn't very pretty. As you've said though there'll be a nicer C++ abstraction for this

Yes, that would work, but it looks like old C code style :-) We'll hopefully find something that looks a bit more like modern C++ with more type safety .

so I'll experiment a little

Keep us up to date :-) I'll definitively have a look at those lambdas and std::function.

mark9064 commented 2 months ago

OK, re-abstracted to use lambdas now. If we decide to go with this approach I'll rebase the branch to be more sensible after Edit: Oh man I keep getting bitten by no commit hooks after rebase Double edit: Just saw your reply. Makes a lot of sense to abstract this out of the driver then. It makes sense generally too but I thought we might get away without this as this is a small project, but it seems not to be the case!

mark9064 commented 2 months ago

I've been doing some further reading on lambdas and std::function. It looks like that if the capture list is small enough, the captured variables fit within the std::function instance and so the heap is not used. Because the capture list is only a reference to an object, the heap should not be used in this case. So I think this avoids dynamic allocation. As for the cost, I believe it will add another layer of indirection as inlining will not be possible. But effectively, it boils down to a callable class that stores a pointer to the member function and a pointer to the instance to call the member function on, all of which fit on the stack. So a bit less efficient that the C style function pointer/data pointer approach, but only one function calls worth so perfectly acceptable IMO.

Could be worth analysing what the compiler actually does here. Doing this statically looks quite challenging though

mark9064 commented 2 months ago

Just refactored the write data/command interface to be a bit nicer. The lambdas also only capture one integer now and reference static functions only which should hopefully give the compiler an easier time optimising/inlining

JF002 commented 2 months ago

It looks like that if the capture list is small enough, the captured variables fit within the std::function instance and so the heap is not used

Yes, that's also what I figured out. However, this behavior is not part of the specification : it's up to the implementation to decide to use the heap or not. However, I added a breakpoint in malloc() and pvPortMalloc() and it wouldn't break when creating/executing the callback, so I guess it confirms that the heap is not use in this specific case.

As for the cost, I believe it will add another layer of indirection as inlining will not be possible

I think this additional indirection is an acceptable tradeoff for a better design :)

Could be worth analysing what the compiler actually does here. Doing this statically looks quite challenging though

I had a look at the disassembly code in CLion, but it just shows that it branches to the function in the STL, but does not show the ASM code of this function. I didn't try to figure that out from the complete ASM code of the project.

So, yeah, I think we can use this implementation based on std::function, it seems to work fine and improves the code at the "application" level (we do not have to manually wait for the end of the SPI transfer in LittleVGL.cpp).

I still have a few comments to do on the code, if you don't mind:

void St7789::WriteSpi(const uint8_t* data, size_t size, std::function<void(bool)>&& TransactionHook) {
  spi.Write(data, size, std::move(TransactionHook));
}

bool Spi::Write(const uint8_t* data, size_t size, std::function<void(bool)>&& TransactionHook) {
  return spiMaster.Write(pinCsn, data, size, std::move(TransactionHook));
}

bool SpiMaster::Write(uint8_t pinCsn, const uint8_t* data, size_t size, std::function<void(bool)>&& transactionHook) {
  [...]
  this->TransactionHook = std::move(transactionHook);

Thanks!

JF002 commented 2 months ago

@mark9064 btw, I can apply all those small changes if you wish :)

mark9064 commented 2 months ago

The lambda object still needs to be copied from the stack frame creating the lambda to the controller as the stack frame will be destroyed later. I did some research and to me it seems that std::move is useless if all data is completely on the stack (but instead for a string it is useful because string contents live on the heap (provided they are large enough)). Still, it seems semantically correct so I agree that it's the way to go. Passing a reference and then doing one final copy into the SpiMaster at the end would be equivalent.

Agree on simplifying the hook by removing the argument, I'll also fix the naming

mark9064 commented 2 months ago

OK, I think we were both overcomplicating it! As we don't need to store the lambda anymore, we can just pass in a reference. This way it's nice and simple

mark9064 commented 2 months ago

Noticed I forgot to apply your naming suggestion now the semantics of the hook has changed. Corrected now

JF002 commented 1 month ago

As we don't need to store the lambda anymore, we can just pass in a reference.

Yes, you're right. Passing the reference works fine and does the job, but it, in this case, it works because the lambda is executed "synchronously" during the call to Spi.Write(). It wouldn't work for a post transaction hook, for example. In this regard, I think the move semantic is a bit better : the caller does not have to worry about the life time of the function pointer.

On the other hand, your implementation is simple and works so we can use it and extend it later on if we ever need it ;-)

mark9064 commented 1 month ago

Sounds good. Let's switch to the move system if we ever need post-hooks. For now it's as simple and efficient as it can be