andy-shev / linux

Linux kernel source tree
Other
25 stars 11 forks source link

Hsu dma optimize #34

Closed htot closed 3 years ago

htot commented 3 years ago

I cleaned up "arm rx dma.." and hope it is now more clear. Also I enabled on all hsu ports that have dma enabled. On the transmit side "use linear buffer..." the dma transmit still gets split up when the transfer is longer then 2048 (I see an additional interrupt occurring if I toggle a gpio), but I don't know what causes that. It seems to do no harm.

htot commented 3 years ago

In the commit message capitalize and fix all references accordingly. But the main question, why it's 8250_mid specific change? Shouldn't it be done on 8250_dma level?

In principle this could apply to all uarts with dma support. The idea is to eliminate the uart interrupt at the beginning of a message that sets up dma. Of course dma interrupt at the end of message remains. So, if 2 messages come too quickly dma will not be setup in time for the 2nd. This is why interchar gaps on the transmitting end must be prevented, > 4 char gap will cause dma interrupt, but the handler will be too late to setup for the next (depending on rx fifo size and trig).

But the question is how the dma handles waiting for the first char (it times out after not receiving anything for 4 char times, but only after receiving a first char). There are so many different uarts, I didn't dare to touch code affecting all. But you are more brave then me - and know better what you are doing.

Remember when I wrote this code, it caused kernel to hang, due to the uart (all uarts including the console) hardware locking up. I couldn't find why. Now, I didn't change anything, the same code works. Somewhere some hardware initialized differently? And even now we still have to set /dev/cpu_dma_latency to 0. PM related stuff I don't really understand.

htot commented 3 years ago

Besides similar questions to the commit message (style related), what you are doing is called bounce buffers. So, the question is, can we rather prepare SG list out of two descriptors (one for tail and another for the beginning of the buffer if needed) and let DMA driver to handle it?

Afaiu SG is implemented by auto setting up dma after each descriptor completes, but I'm not sure if that is done in hardware (i.e. does the dma hold a list of pointers it self, or is it do in software in the dma interrupt handler). As it is now, the whole transmit message is a single string that is send in one dma transfer. There is no interchar gap on transmit.

But, I did see for > 2048 transfers the message did get split into 2 causing an additional dma interrupt and potentially a gap. This is still not clear to me where the 2048 limit comes from.

andy-shev commented 3 years ago

In the commit message capitalize and fix all references accordingly. But the main question, why it's 8250_mid specific change? Shouldn't it be done on 8250_dma level?

In principle this could apply to all uarts with dma support. The idea is to eliminate the uart interrupt at the beginning of a message that sets up dma. Of course dma interrupt at the end of message remains. So, if 2 messages come too quickly dma will not be setup in time for the 2nd. This is why interchar gaps on the transmitting end must be prevented, > 4 char gap will cause dma interrupt, but the handler will be too late to setup for the next (depending on rx fifo size and trig).

But the question is how the dma handles waiting for the first char (it times out after not receiving anything for 4 char times, but only after receiving a first char). There are so many different uarts, I didn't dare to touch code affecting all. But you are more brave then me - and know better what you are doing.

I guess it makes sense to submit this for all 8250 compatible ports that are using a generic 8250_dma (mostly Intel).

Remember when I wrote this code, it caused kernel to hang, due to the uart (all uarts including the console) hardware locking up. I couldn't find why. Now, I didn't change anything, the same code works. Somewhere some hardware initialized differently? And even now we still have to set /dev/cpu_dma_latency to 0. PM related stuff I don't really understand.

Yes, PM stuff is always needed for UART.

andy-shev commented 3 years ago

Besides similar questions to the commit message (style related), what you are doing is called bounce buffers. So, the question is, can we rather prepare SG list out of two descriptors (one for tail and another for the beginning of the buffer if needed) and let DMA driver to handle it?

Afaiu SG is implemented by auto setting up dma after each descriptor completes, but I'm not sure if that is done in hardware (i.e. does the dma hold a list of pointers it self, or is it do in software in the dma interrupt handler). As it is now, the whole transmit message is a single string that is send in one dma transfer. There is no interchar gap on transmit.

HSU DMA supports up to 4 hw descriptors, each of up to 128k size. Other DMA are even better in this sense. So, I would rather see this implemented as SG list out of 1 or two elements (depending if we have the buffer turned over or not).

But, I did see for > 2048 transfers the message did get split into 2 causing an additional dma interrupt and potentially a gap. This is still not clear to me where the 2048 limit comes from.

Isn't it abvious? The buffer size is one page,m i.e. 4k. 2k is a half and any size bigger the half has a chance to be turned over the circular buffer. That's why we got a split.

htot commented 3 years ago

Isn't it abvious? The buffer size is one page,m i.e. 4k. 2k is a half and any size bigger the half has a chance to be turned over the circular buffer. That's why we got a split.

No, I mean after linearizing. So we send 1 larger then 2k DMA transfer, t seems still to be split into 2 parts. As if there is a limit in the DMA subsystem?