Xilinx / dma_ip_drivers

Xilinx QDMA IP Drivers
https://xilinx.github.io/dma_ip_drivers/
526 stars 398 forks source link

QDMA, Using Multiple Interrupt Aggregation Rings, Associating Each Queue to Specific Data Interrupt Vector and Ring #145

Open joshualant opened 2 years ago

joshualant commented 2 years ago

Hi,

I apologise in advance for the length of this post...

(We currently are developing on version 20.1 of the driver.) We have extended the QDMA driver with network capabilities (similar to as is done in the QEP driver). Since the H2C and C2H interrupts are by default serviced on the same interrupt vector, this means that the TX and RX cannot run concurrently on separate cores.

In an attempt to increase the performance we wish to have 2 interrupt aggregation rings instead of 1. One for C2H (RX) and one for H2C (TX), associating one data interrupt vector with the first ring, and another data interrupt vector with the second ring.

To achieve this we have increased the defined maximum number of data vectors per interrupt context from 1 to 2. In qdma_access_common.h:

#define QDMA_NUM_DATA_VEC_FOR_INTR_CXT 2

We see that this value is then used in the function:

int intr_ring_setup(struct xlnx_dma_dev *xdev)

in a for loop to allocate the number of interrupt rings:

      for (counter = 0;
             counter < QDMA_NUM_DATA_VEC_FOR_INTR_CXT;
            counter++) {
             intr_coal_list_entry = (intr_coal_list + counter);
             intr_coal_list_entry->intr_rng_num_entries =
                             num_entries;
             intr_coal_list_entry->intr_ring_base = intr_ring_alloc(                                                                                                                                                                                                        
                     xdev, num_entries,
                     sizeof(union qdma_intr_ring),
                     &intr_coal_list_entry->intr_ring_bus);

We see that the number of interrupt rings is indeed changed to 2...

We then associate each queue (C2H/H2C) with one of these vector ids by assigning the descq->intr_id field. If the queue is a H2C queue then we assign it to the value of the next interrupt vector, which is associated with the second interrupt ring. So in qdma_descq.c, in function:

static void desc_alloc_irq(struct qdma_descq *descq)

we add:

 if(descq->conf.q_type == Q_H2C && descq->conf.st == 1) {                                                                                                                                                                                                               
          descq->intr_id ++; 
     }

This solution appears to work, until the point where the pidx/cidx wraps around to 0 after N packets, where N is the number of entries in the ring (so at 511 in our instance). At which point the code appears to loop endlessly around the interrupt aggregation handler (for the new ring), outputting:

      } else if (num_entries_processed == 0) {
          pr_warn("No entries processed\n");
          descq = xdev->prev_descq;
          if (descq) {
              pr_warn("Doing stale update\n");                                                                                                                                                                                                                               
              queue_intr_cidx_update(descq->xdev,
                  descq->conf.qidx, &coal_entry->intr_cidx_info);
          }
      }

The system then crashes after a while since the core spends so long trapped in atomic context.

We have noticed in debugging that under normal operation the color bit is flipped when the last entry of the ring is serviced. This is documented as correct behaviour. However, we notice that in our instance with the newly added ring, the color bit does not appear to be flipped on the ring associated with the new data interrupt vector.

We can see nowhere in the code which performs this bit flip, and hence we guess this must happen in the hardware. Can you please confirm whether this should indeed happen automatically for any newly added aggregation ring (as it appears to for the first ring), and if it should, is there any other place in the code where we would need to modify in order to use two separate interrupt aggregation rings, with different interrupt vector IDs, on one PF?

Many thanks for reading,

Josh and Manos

hmaarrfk commented 2 years ago

Sounds like you have an interesting application.

I don't think Xilinx monitors these threads. You are going to have to talk on their official forum.

Maybe you can use two "QDMA Queues", but one exclusively in Rx, the other exclusiviely in Tx.

Then in hardware, you can map them onto the same "function".

Just a thought.

joshualant commented 2 years ago

Okay cheers I have posted on the Xilinx forum. I thought maybe putting the problem here it might be seen more quickly by people able to provide some answers as to what is going on.

For reference of anyone else reading, the forum post is here:

https://support.xilinx.com/s/question/0D52E00006zy6m3SAA/qdma-using-multiple-interrupt-aggregation-rings-associating-each-queue-to-specific-data-interrupt-vector-and-ring

hmaarrfk commented 1 year ago

Hello,

My name is Mark Harfouche. I am not affiliated with Xilinx in any way. Over the years of using QDMA, I've been wanted better community organization.

I've created a fork of dma_ip_drivers which I intend to maintain and work with the community at large to improve.

The fork can be found https://github.com/hmaarrfk/dma_ip_drivers

For now, I am stating the main goals of the repository in https://github.com/hmaarrfk/dma_ip_drivers/issues/2

If you are interested in working together, feel free to open an issue or PR to my fork.

Best,

Mark

PS. Xilinx over the years, has shown that it does not intent to reply to this repository. I've found that they tend reply if you post on their own forum.