DISTORTEC / distortos

object-oriented C++ RTOS for microcontrollers
https://distortos.org/
Mozilla Public License 2.0
430 stars 66 forks source link

STM32 DMA drivers #48

Closed CezaryGapinski closed 5 years ago

CezaryGapinski commented 6 years ago

Hi!

I would like to ask if you have any plans to implement DMA drivers in HAL layer. As you noticed I've started some time ago implementing DMA on my own, so I it will be nice to know your opinion about this.

Below is a list of most important points what I stumbled across:

  1. I was trying to create some abstract classes to DMA and put it into device layer level, but based on your previous comment in my code I realized that really will be better to move DMA driver to chip peripheral layer only without adding abstract to distortos devices level. DMA is too much specific to chip and there is no need to combine with DMA device.

  2. There is still need to create other drivers which can use DMA. Based on device layer I've implemented ChipDmaBase interface with callbacks which can be used to create other DMA drivers (like ChipUartDmaTransfer class in USARTv2 peripheral).

  3. The main class of DMA is ChipDmaLowLevel which is used to configure DMA peripheral. I was trying to deal with compatibility with future version of DMAv2 implemented in other STM32 families like F4 and F7 (for now I used L0 to develop DMAv1). That's why I've added startDoubleBuffer and exchangeDoubleBufferAddress methods which do nothing in DMAv1, but these have meaning in DMAv2 driver. Probably it is better to remove this methods right away from DMAv1. More important point is the correct approach to start and startTransfer methods. Do you think it is good to have the same method signatures for DMAv1 and DMAv2 versions even if burst or fifo threshold is not supported in the DMAv1? Maybe should be somewhere in perihperal abstract class with common interface for DMAv1 and DMAv2 or it is too complicated? I'm just worrying about conditional preprocessor in the code to use different start and startTransfer methods for DMAv1 or DMAv2 it they have different signatures.

  4. Usage of stopTransfer method (which can be used indirectly in SerialPort to load chunk of data into USART buffer). For DMAv2 I've noticed in the reference manuals descriptions about suspending transactions, where is need to disable DMA channel, and next wait till transfer complete interrupt will be called, to be sure that this chunk of transfer is finished and next DMA transfer counter can be read. For chips which are using DMAv1 I didn't noticed that requirement, so now channel is just disabled and next transfer counter is read. I was thinking about using semaphore to deal with this issue for DMAv2, like in SerialPort class, to wait for interrupt to notify stopTransfer method about correct DMA transfer channel disable. But in this option DMA driver will use Semaphore what causing too much relation between distortos Core and HAL layer. Maybe it is easier to make pull method to check Transfer Complete flag and next channel enable bit is reset to 0.

  5. Exemplary usage in USARTv2 driver: I was pretty long thinking about how to connect DMA transfer and USART. Eventually I've decided to create ChipUartDmaTransfer objects which hold references to ChipUartLowLevel and ChipDmaLowLevel peripherals. From USART side it is used to configure and start DMA transfer. From DMA side it is used to call specific USART transfer complete and error methods. ChipUartDmaTransfer objects are created as global objects. Constructor needs reference to USART and DMA peripherals in compile time, that's why getDmaLowLevel constexpr function is used to get peripheral for DMA module and channel.

  6. Configuration in kconfig: I mostly based on STM32L0 family. I think the most important problem is with remapping channels to specific peripheral in this commit: https://github.com/CezaryGapinski/distortos/commit/f46435163acea8ac17a2baa61838517043e8e634 As you can see there is a lot of code to code with few channels for USART but I didn't have better idea to make this think better.

I have plans to buy STM32F429I-DISC1 board to try implement DMAv2 driver. Maybe then much better idea come into my mind how to bring DMAv1 and DMAv2 to a common interface, etc, but I would like to ask what do you think about my code from dmav1_with_usartv2_for_stm32l0 branch. It is totally screwed and you have other plans to implement DMA transfer. If yes please, let me know, because I don't know if I should continue to work on that or it is better to gave up with this :).

FreddieChopin commented 6 years ago

You asked me too many questions, so I really cannot answer them all );

I would like to ask if you have any plans to implement DMA drivers in HAL layer.

Yes, definitely. So far I have postponed that as putting DMA-related info into kconfig is not very convenient...

I was trying to create some abstract classes to DMA and put it into device layer level, but based on your previous comment in my code I realized that really will be better to move DMA driver to chip peripheral layer only without adding abstract to distortos devices level. DMA is too much specific to chip and there is no need to combine with DMA device.

Exactly - as long as all STM32 DMA drivers have identical (or at least very similar [; ) public API, they may as well be non-virtual. Some peripherals (DMA, F(S)MC, ...) are way too specific to be accessible by user code with a generic abstract interface. Additionally there's not much use in exposing DMA to user code anyway - in 99,666% of cases it will only be used internally by other drivers.

I was trying to deal with compatibility with future version of DMAv2 implemented in other STM32 families like F4 and F7 (for now I used L0 to develop DMAv1).

If you spend too much time thinking about the future it will be really hard to do something... My advice is to buy/make a board which actually has a different DMA implementation and try to work with both boards. In designing APIs I usually also check other projects to see how they did that - you may want to start with that. I usually check mbed, Linux, ChibiOS/RT and NuttX.

Usage of stopTransfer method (which can be used indirectly in SerialPort to load chunk of data into USART buffer).

From what I understand, you don't need to actually stop the transfer - just read the register with count of transfers.

Configuration in kconfig:

My current plan is to ditch kconfig completely. It is way too limiting to use here and not very suited to project needs... I've also reworked the board generator (again), because I plan to keep all chip-specific details in a nice Python dictionary, built from a *.csv file. The new structure is much simpler and more powerful, YAML is easier to read than devicetree and doesn't need to contain all these useless details required by dts (like the reg properties and all that stuff). In the very near future I would like to convert the project to use cmake instead of kconfig+make, the configuration options will be generated by the board generator (see for example the options for linker script) and editable with cmake-gui.

Now a generic comment - I think your current implementation is way too complex. I would start by creating a completely separate ChipUartLowLevel - we would rename the one already present to ChipUartInterruptLowLevel (or sth like that), your new variant would be called ChipUartDmaLowLevel (or sth similar). If there is some common code (for example baudrate configuration and stuff like that), it can be moved to a common base class. If there are significant differences between various DMA implementations, maybe it would be necessary to have ChipUartDmaV1LowLevel, ChipUartDmaV2LowLevel and so on, but probably it will not be so bad. There's absolutely no point in complicating this more than necessary - you either use interrrupts or you use DMA for both directions - there's no reason to have support for interrupt + DMA in one direction or support for changing that in runtime. If someone has such needs - sorry, write your own custom class for that (;

So just start small, by having a completely separate ChipUartDmaLowLevel and when it is working, we can think how to reduce code duplication.

Please be advised, that my current plan to implement cmake support and drop kconfig may lead to a lot of conflicts on your side ); I see that one of my previous changes - addition of BIND_LOW_LEVEL_INITIALIZER() - already will cause some merge conflicts for you.

Maybe you have some experience with cmake and would like to help with the transition? Or maybe you think the plan to use cmake is a stupid idea and it will be as bad as kconfig+make?

FreddieChopin commented 6 years ago

https://community.st.com/thread/42689-efficiently-use-dma-with-uart-rx-on-stm32 - this approach definitely should be used for receiving

CezaryGapinski commented 6 years ago

First of all thank you for your answer :).

I was trying to deal with compatibility with future version of DMAv2 implemented in other STM32 families like F4 and F7 (for now I used L0 to develop DMAv1).

If you spend too much time thinking about the future it will be really hard to do something... My advice is to buy/make a board which actually has a different DMA implementation and try to work with both boards. In designing APIs I usually also check other projects to see how they did that - you may want to start with that. I usually check mbed, Linux, ChibiOS/RT and NuttX.

I have bought STM32F429I-DISC1 already, so I will try to make DMAv2 driver :).

Usage of stopTransfer method (which can be used indirectly in SerialPort to load chunk of data into USART buffer).

From what I understand, you don't need to actually stop the transfer - just read the register with count of transfers.

Yes, but please look at SerialPort::writeImplementation method where SerialPort::stopWriteWrapper method is called to update current position in write buffer. Next in this method uart_.stopWrite is called to stop transfer. Next in SerialPort::startWriteWrapper uart_.startWrite method is called with new pointer to the buffer. Methods are used in the USART driver and indirectly in the DMA driver, where stop and start transfer operation must also be done (especially when transfer with new buffer pointer is done).

Now a generic comment - I think your current implementation is way too complex. I would start by creating a completely separate ChipUartLowLevel - we would rename the one already present to ChipUartInterruptLowLevel (or sth like that), your new variant would be called ChipUartDmaLowLevel(or sth similar). If there is some common code (for example baudrate configuration and stuff like that), it can be moved to a common base class. If there are significant differences between various DMA implementations, maybe it would be necessary to have ChipUartDmaV1LowLevel, ChipUartDmaV2LowLevel and so on, but probably it will not be so bad. There's absolutely no point in complicating this more than necessary - you either use interrrupts or you use DMA for both directions - there's no reason to have support for interrupt + DMA in one direction or support for changing that in runtime. If someone has such needs - sorry, write your own custom class for that (;

I have to stay with some base class because now ChipUartLowLevel has an internal class with Parameters and this is common point for other classes like for the future implementation for ChipUartInterruptLowLevel or ChipUartDmaV1LowLevel/ChipUartDmaV2LowLevel. OK, that is no an issue and I hope that I manage to implement implement class for the ChipUartDmaV1LowLevel based on the ChipUartInterruptLowLevel and next move common methods into base class.

But let suppose that I will create the ChipUartDmaV1LowLevel class which also inherit from the ChipUartLowLevel. I also need to implement in the ChipUartDmaV1LowLevel interface from ChipDmaBase (like in actual ChipUartDmaTransfer) to have kind of method to call DMA events (like transferCompleteEvent, halfTransferCompleteEvent, transferErrorEvent) from the ChipDmaLowLevel class. I don't know how to deal with this. I have to create "multiple inheritance" from the ChipUartLowLevel class and the ChipDmaBase interface? That's why I made a the ChipUartDmaTransfer class objects in the ChipUartLowLevel, because it seemed to be much less complicated.

Maybe you have some experience with cmake and would like to help with the transition? Or maybe you think the plan to use cmake is a stupid idea and it will be as bad as kconfig+make?

It is hard to say because I get accustomed to kconfig configuration. I know that you are looking for answer in this topic in another websites, but unfortunately I have no experience with cmake configuration menus, but I'm looking forward to see any example with your new idea :).

https://community.st.com/thread/42689-efficiently-use-dma-with-uart-rx-on-stm32 - this approach definitely should be used for receiving

Actually there are SerialPort::read, SerialPort::tryReadFor, SerialPort::tryReadUntil methods which uses SerialPort::readImplementation method with uart_.stopRead and uart_.startRead and I thought that is sufficient to use. For example even if there will be IDLE line active but not enough number of bytes were received the SerialPort::read with timePoint argument equal to null will be wait anyway till requested number of bytes will be received. Maybe you have right, I will think about it one more time. Thanks!

FreddieChopin commented 6 years ago

I have to create "multiple inheritance" from the ChipUartLowLevel class and the ChipDmaBase interface?

I think this would be a good approach, as long as you inherit only from at most one "real" class and as many pure abstract classes as you like.

Actually there are SerialPort::read, SerialPort::tryReadFor, SerialPort::tryReadUntil methods which uses SerialPort::readImplementation method with uart.stopRead and uart.startRead and I thought that is sufficient to use. For example even if there will be IDLE line active but not enough number of bytes were received the SerialPort::read with timePoint argument equal to null will be wait anyway till requested number of bytes will be received. Maybe you have right, I will think about it one more time. Thanks!

SerialPort implementation is not fixed in stone. It works for interrupts, but maybe for DMA it should be reworked somehow. It depends how bad will it be without such rework (;

CezaryGapinski commented 6 years ago

I think this would be a good approach, as long as you inherit only from at most one "real" class and as many pure abstract classes as you like.

There still is a problem because I need two different objects for DMA TX and RX direction separately :). Inheritance from ChipUartLowLevel class and the ChipDmaBase interface in new ChipUartDmaV1LowLevel class will provide methods just for one direction (one implementation of transferCompleteEvent, halfTransferCompleteEvent, transferErrorEvent events). Something like ChipUartDmaTransfer class looks good for me. I have no other idea to do this easier like implement ChipUartDmaTransfer in new 'ChipUartDmaV1LowLevel'

https://community.st.com/thread/42689-efficiently-use-dma-with-uart-rx-on-stm32 - this approach definitely should be used for receiving

SerialPort implementation is not fixed in stone. It works for interrupts, but maybe for DMA it should be reworked somehow. It depends how bad will it be without such rework (;

Do you have plans to implement mode where there is no idea how many bytes are expected. Now in SerialPort read methods number of expected bytes (or timeout) is known in advance as an minSize or timePoint/duration argument?

CezaryGapinski commented 6 years ago

Hi! Do you have any ideas, hints or what should I change in my branches for DMAv1 and DMAv2? Do you think that branches can be partially or as a whole useful in master branch?

cutephoton commented 5 years ago

Just passing by... I can see this is an older thread but just wanted to try to answer one open question...

"Do you have plans to implement mode where there is no idea how many bytes are expected. Now in SerialPort read methods number of expected bytes (or timeout) is known in advance as an minSize or timePoint/duration argument?"

It came up recently when I was looking at a different RTOS (when I apparently failed to RTFM): https://github.com/lixpaulian/stm32f7-uart/issues/2

The idle flag is right approach to that problem, but the standard HAL drivers don't implement support for it so it wasn't obvious to me. I used to implement this using a timer but you live and learn... :)

FreddieChopin commented 5 years ago

The idle flag is right approach to that problem, but the standard HAL drivers don't implement support for it so it wasn't obvious to me. I used to implement this using a timer but you live and learn... :)

Well, I wouldn't say that this is necessarily "the right approach" in a generic driver, as there are chips which don't implement that mechanism. I know that currently distortos is extremely STM32-centric, but using it with other hardware has to be an option, so the generic interface must not assume that some non-standard features are always available... I know that previously I said something opposite, but I' still not sure which path to follow (;

Generally the SerialPort class was built as yet another one low-level building block, with an interface which almost perfectly fits a POSIX file interface. This way it may not be the best fit for using standalone, but works remarkably well when hooked into stdio functions (which is quite easy using fopencookie() from newlib). The thing with POSIX file interface (read(), write(), open(), close(), ...) is that it actually never uses the concept of "return as much as is being transferred in one block" (which would really benefit from the IDLE detection). In non-blocking mode the driver should immediately return as many bytes as there are available right now, and in blocking mode it should either return everything that is available right now or wait for at least one byte. The last part is important, because that's how functions like fread(), gets() and scanf() interact internally with the lower level - they always tell you exactly how many bytes they want. The end result is that they usually just read one byte at a time anyway. Using IDLE detection in lower level will therefore bring no gain when the driver is hidden behind FILE*.

I'm open to expanding the SerialPort API to be more friendly to a standalone use, where IDLE detection may be useful, but the current functionality must remain in place and this still has to work well with stdio. However I'm not really sure this is really worthwhile given that the alternative is very tempting - no matter how much you would extend the API of this class, it won't be better than printf() and scanf()...

cutephoton commented 5 years ago

Sorry - I try to remember that in sw engineering the "right way" should banned from discussions. :) I meant purely from a STM32 perspective. As far as I'm concerned this ends up as an implementation detail in the driver.

This is a posix concept - O_NONBLOCK? Right? IDLE flag exists purely to terminate DMA prematurely. However inspecting the interface, it's not feasible (or maybe wise?) to do this down there. It's more of a higher level concept and depends on a driver maintaining an RX buffer to store pending data as it arrives.

The low level interface appears to be almost like a the C++ equivalent of ST's HAL drivers and the same issue exists there.

FreddieChopin commented 5 years ago

As far as I'm concerned this ends up as an implementation detail in the driver.

I did not think about that much, but at this moment I'm under the impression that it would be hard to write such driver in a generic way. This idle detection is available in all STM32, but can this be considered to be an universally available feature of all interesting microcontrollers which could be a target for distortos? For example chips from NXP, Freescale (now NXP), Atmel (now Microchip)? If yes, then fine, otherwise requiring such feature (or something equivalent) would give very little to STM32 and a lot of trouble for any other driver.

This is a posix concept - O_NONBLOCK? Right?

Yes, this is meant as a flag you use for open() with "special files" which selects whether operations are blocking or not. However whether you use it or not this still doesn't require idle detection - as I said earlier, the driver is still expected to either return all that it has "right now" or - if blocking is enabled - block until it has one byte. Please note that the behaviour is explicitly defined - it should unblock as soon as it has one byte. That's why I say that idle detection has little value in this case (even if it seems like a nice concept), because higher level (POSIX, stdio, streams, sockets, pipes, terminals, ...) never expects the driver to "block until the other end is still transmitting something", which is the only case where idle detection has a clear advantage.

It's more of a higher level concept and depends on a driver maintaining an RX buffer to store pending data as it arrives.

Exposing so much hardware details in the higher level doesn't seem like a good idea to me...

The low level interface appears to be almost like a the C++ equivalent of ST's HAL drivers and the same issue exists there.

I wouldn't say that this is an issue (;

FreddieChopin commented 5 years ago

STM32 DMA drivers were finally added - currently pretty basic, but enough to work well with SPI drivers. Most likely this will be enough to work with UART too.

https://github.com/DISTORTEC/distortos/blob/master/source/chip/STM32/peripherals/DMAv1/include/distortos/chip/DmaChannel.hpp https://github.com/DISTORTEC/distortos/blob/master/source/chip/STM32/peripherals/DMAv2/include/distortos/chip/DmaChannel.hpp https://github.com/DISTORTEC/distortos/blob/master/source/chip/STM32/peripherals/SPIv1/include/distortos/chip/SpiMasterLowLevelDmaBased.hpp https://github.com/DISTORTEC/distortos/blob/master/source/chip/STM32/peripherals/SPIv2/include/distortos/chip/SpiMasterLowLevelDmaBased.hpp