andy-shev / linux

Linux kernel source tree
Other
25 stars 11 forks source link

Hsu dma optimize #35

Closed htot closed 2 years ago

htot commented 2 years ago

All 3 patches have one goal: to minimize chance of loosing char's of a incoming message.

The main reason of loosing char's is the RX interrupt latency. To prevent this, we use DMA for RX but we make sure dma is armed before 1st char arrives (in current kernel 1st char triggers interrupt, which sets up dma). Instead, after RX dma completes we arm dma again for the next message.

Of course the above works only when at the end of a message there is a little time to handle dma interrupt and arm again. This is in practice always the case, except when a single message gets split into 2 parts.

This happens unintentionally in 2 cases: when the circular transmit buffer wraps around and when messages longer then 2k are sent (while the message length is < 1 page). In both cases an interchar gap can occur. When this gap > 5 char lengths on the receiving side dma will terminate possibly leaving not enough time to handle the dma and set up new.

To handle the first case we put both parts of the message on a sgl. To handle the second case, when using DMA we prevent splitting transmit.

htot commented 2 years ago

With these patches you can transmit at 3.5Mb/s without loosing chars. You can test this by setting up a pppd between 2 Edisons.

Note: We need to prevent the dma controller from entering a sleep state as waking up can take a long time and the latency can easily be longer then the UART FIFO length (at > 500kb/s). To do this we need to write 0 to /dev/cpu_dma_latency. One way to do this without coding is to run cyclictest in the background:

cyclictest > /dev/null &

andy-shev commented 2 years ago

Note: We need to prevent the dma controller from entering a sleep state as waking up can take a long time and the latency can easily be longer then the UART FIFO length (at > 500kb/s). To do this we need to write 0 to /dev/cpu_dma_latency.

I'm wondering if other (small) values work for this. It should be calculated based on bitrate and FIFO size. We actually can do this as general solution for all 8250 drivers (see OMAP one which configures this from the driver on ->start() IIRC).

htot commented 2 years ago

I'm wondering if other (small) values work for this. It should be calculated based on bitrate and FIFO size. We actually can do this as general solution for all 8250 drivers (see OMAP one which configures this from the driver on ->start() IIRC).

That would be very neat. Except I don't what the value means exactly. I thought it was just a switch to prevent dma going to sleep.

andy-shev commented 2 years ago

I'm wondering if other (small) values work for this. It should be calculated based on bitrate and FIFO size. We actually can do this as general solution for all 8250 drivers (see OMAP one which configures this from the driver on ->start() IIRC).

That would be very neat. Except I don't what the value means exactly. I thought it was just a switch to prevent dma going to sleep.

It a latency for PM QoS in microseconds. So, the value shows how long CPU may stay in sleep state before starting processing of the data. I think values up to 10 should give you the same result, can you check that?

htot commented 2 years ago

It a latency for PM QoS in microseconds. So, the value shows how long CPU may stay in sleep state before starting processing of the data. I think values up to 10 should give you the same result, can you check that?

I don't know the default value on Edison but I noted that at 500kb/s the default value creates no issues.

With a little effort I can test with other values. But I believe /dev/cpu_dma_latency takes a 4 byte binary number. So I have been lazy and just starting cyclictest to set it to 0. ~How would I best set 10 from shell?~

But it appears:

exec 3<> /dev/cpu_dma_latency; echo "hex_latency_in_uS" >&3

should do the trick (https://lkml.org/lkml/2017/8/23/89)

htot commented 2 years ago

Default I get (hex):

root@yuna:~# xxd -l 16 -p /dev/cpu_dma_latency
00943577

It appears this must be read as 0x77359400 or 2000 sec. (https://blog.csdn.net/msdnchina/article/details/98659435)

andy-shev commented 2 years ago

Default I get (hex): It appears this must be read as 0x77359400 or 2000 sec. (https://blog.csdn.net/msdnchina/article/details/98659435)

#define PM_QOS_CPU_LATENCY_DEFAULT_VALUE        (2000 * USEC_PER_SEC)