PeterBorisenko / w5500-lwip-freertos

RTOS-based binding for W5500 and LwIp
22 stars 6 forks source link

Overwrites data in TX buffer when writing multiple pbufs #4

Open madsci77 opened 11 months ago

madsci77 commented 11 months ago

I've been having trouble with TCP data not sending properly in some cases and I think I've finally identified the issue. wiz_transmit_buf() iterates through all of the pbufs in a linked list, calling wiz_send_data() for each pbuf. The issue is that the value read from S0_TX_WR is NOT the same value that's written to S0_TX_WR until the final SEND command is issued.

What happens is that the first pbuf is written to the socket 0 TX buffer at the appropriate starting point, and then on the next call of wiz_transmit_buf() the address is read from S0_TX_WR as the original starting address, and the first pbuf is overwritten by all subsequent pbufs. This means there's no valid Ethernet or IP header.

My fix has been to modify the loop to read S0_TX_WR once and track the write address locally until all writes are done, and then S0_TX_WR is updated and the SEND command is issued.

My version of the code is diverging from the original style and structure considerably so you probably don't want my mods. My current wiz_transmit_pbuf() is included below. It incorporates wiz_send_data() into the same function. It also uses multi-byte reads and writes for the TX_WR and TX_FSR registers to save on some SPI transaction overhead. You need to make sure you have ntohs() and htons() available if you're using this.

void wiz_transmit_pbuf(struct pbuf *p)
{
    static uint16_t tx_free = 0;
    uint16_t length = p->tot_len;
    uint16_t retries = 0;
    uint16_t ptr = 0;
    uint32_t addrsel = 0;

    // Only check tx_free if our local copy says we're out of space
    while (tx_free < length)
    {
        tx_free = getSn_TX_FSR(0);
        if (tx_free < length)
        {
            vTaskDelay(1);
        }
        retries++;
        if (retries > WIZ_SPI_TX_TIMEOUT_TICKS)
        {
            debug_print(ANSI_YELLOW "ETH TX timeout\r\n" ANSI_RESET);
            return;
        }
    }

    WIZCHIP_READ_BUF(Sn_TX_WR(0), (uint8_t *)&ptr, 2);
    ptr = ntohs(ptr);

    while (1)
    {
        //debug_print_serial("Wiz send %u @0x%04x\r\n", p->len, ptr);
        addrsel = ((uint32_t)ptr << 8) + (WIZCHIP_TXBUF_BLOCK(0) << 3);
        WIZCHIP_WRITE_BUF(addrsel, p->payload, p->len);
        ptr += p->len;

        tx_free -= p->len;
        if (p->len >= p->tot_len)
        {
            break;
        }
        p = p->next;
    }
    ptr = htons(ptr);
    // Update TX write pointer and send packet
    WIZCHIP_WRITE_BUF(Sn_TX_WR(0), (uint8_t *)&ptr, 2);
    setSn_CR(0, Sn_CR_SEND);
}
PeterBorisenko commented 11 months ago

Thank you for your issue. The style is different, yes. While the structure is not relevant now as my own code look very different from that now. The code was made quite a long ago, and the project which this code was made for is currently not supported. It was tested on relatively small amounts of data (MQTT exchange with AWS IoT platform) so issues like you described have a place to be.

I am honestly hoped to help someone providing a working solution because I spent some time to make it.

I will be glad to merge your pull request so the solution become more complete and helpful.

madsci77 commented 11 months ago

OK, give me some time and I'll try to get it cleaned up enough to share. I've got a release deadline to deal with, so getting it working is taking priority over making it pretty. I've got LPC-specific SPI code mixed in there and other stuff where it doesn't belong.

How much have you looked at the possibility of concurrency problems between the lwIP task and the spi_if task? My concern is that output happens in the lwIP task but it could be interrupted by the spi_if task handling incoming packets. If the SPI access at least isn't mutex protected it seems like that could be a problem.

For now I've set the lwIP task to be higher priority than spi_if so the Wiznet interrupt semaphore won't be handled until the current transmission is complete. I feel like that deserves a closer look, though, if it's not something that's already been analyzed.