Microchip-MPLAB-Harmony / net

Harmony 3 Network library
https://onlinedocs.microchip.com/v2/keyword-lookup?keyword=MH3_net&redirect=true
29 stars 9 forks source link

Anomalies in scheduling of TCP tasks #22

Open dchirgwin opened 4 years ago

dchirgwin commented 4 years ago

I shall prefix all my other comments by saying that the timing is very complicated, and it's possible that I have misunderstood the intention of the code in places. However, I have used detailed logs from Tracealyzer to analyze this, and I'm fairly sure there is a problem. I'm running on PIC32MX under FreeRTOS.

First Issue

First issue is that stackTaskRate is not being calculated properly. The value is set in _TCPIPStackCreateTimer() based on a complicated rounding calculation involving SYS_TMR_TickCounterFrequencyGet().

static bool _TCPIPStackCreateTimer(void)
{
    tcpip_stack_tickH = SYS_TMR_CallbackPeriodic(TCPIP_STACK_TICK_RATE, 0, _TCPIP_STACK_TickHandler);
    if(tcpip_stack_tickH != SYS_TMR_HANDLE_INVALID)
    {
        uint32_t sysRes = SYS_TMR_TickCounterFrequencyGet();
        uint32_t rateMs = ((sysRes * TCPIP_STACK_TICK_RATE) + 999 )/1000;    // round up
        stackTaskRate = (rateMs * 1000) / sysRes;
        // SYS_TMR_CallbackPeriodicSetRate(tcpip_stack_tickH, rateMs);
        return true;
    }

    SYS_ERROR_PRINT(SYS_ERROR_ERROR, TCPIP_STACK_HDR_MESSAGE "Tick registration failed: %d\r\n", TCPIP_STACK_TICK_RATE);
    return false;
}

However, the timer uses SYS_TMR_CallbackPeriodic() which is based on a pure millisecond timer, and is not derived from the SYS_TMR_TickCounterFrequency.

I am using the following configuration:

Instead of getting a call to TCPIP_TCP_Task() every 10ms, I get a call every 7ms (on average). This is because stackTaskRate has been set to 7 instead of 5, which means that _TCPIPStackSignalTmo() is eating away at pSigEntry->currTmo in larger chunks than it should.

Solution

I think the code that sets stackTaskRate just needs to be simplified like this:

        stackTaskRate = TCPIP_STACK_TICK_RATE;

Second (related) Issue

Originally I had a slightly different configuration:

This caused very weird errors because TCPIP_TCP_TASK_TICK_RATE is smaller than stackTaskRate, which caused periodic overflows in _TCPIPStackSignalTmo() as pSigEntry->currTmo got increasingly negative, and eventually wrapped around to be positive. The end result was that the periodic TCP processsing (e.g. the Nagle algorithm using TCPIP_TCP_AUTO_TRANSMIT_TIMEOUT_VAL) went idle for about 23 seconds every 106 seconds.

When the signal handler is installed by _TCPIPStackSignalHandlerRegister() there is a check to make sure the period of the signal handler asyncTmoMsis not smaller than stackTaskRate.

            if(pSignalEntry->signalHandler == 0 || pSignalEntry->signalHandler == signalHandler)
            {   // found module slot
                pSignalEntry->signalHandler = signalHandler;
                if ((asyncTmoMs != 0) && (asyncTmoMs < stackTaskRate))
                {
                    asyncTmoMs = stackTaskRate;
                }
                pSignalEntry->asyncTmo = pSignalEntry->currTmo = asyncTmoMs;
                return pSignalEntry;

The problem is that, due to the order of initialization, stackTaskRate has not been initialized yet, so this test always passes!

_TCPIPStackSignalHandlerRegister() is called during initialization of the TCPIP stack, but TCPIP_STACK_CreateTimer() is called from _TCPIPStackIsRunState() which is called from TCPIP_STACK_Task(), i.e. AFTER initialization.

Solution

One way to address this might be to separate the initialization of stackTaskRate from the actual starting of the timer. startTaskRate can be calculated early during intialization, and the timer started from TCPIP_STACK_Task() as currently.

adrian-aur commented 4 years ago

Thank you for your report.

For your example though, if the system timer resolution is 1 ms, i.e. SYS_TMR_TickCounterFrequencyGet() returns 1000, the calculation should work without any rounding taking effect: TCPIP_STACK_TICK_RATE = 5, then sysRes = 1000 rateMs = ((1000 5) + 999) / 1000 = 5 stackTaskRate = (5 1000) / 1000 = 5

so the stack TCPIP_STACK_Task() will be called every 5 ms, as intended. Is there an error in the SYS_TMR_TickCounterFrequencyGet(), it doesn't return 1000, or what is the problem?

dchirgwin commented 4 years ago

Thanks for your reply, Adrian. You're correct that SYS_TMR_TickCounterFrequencyGet() doesn't return 1000 in my project. It returns 266, so maybe that's the root of the problem I was seeing.

This is derived from SYS_COUNT_TICK_CONV, as seen in sys_time_h2_adapter.h

#define SYS_COUNT_TICK_CONV         150000
static __inline__ uint32_t __attribute__((always_inline)) SYS_TMR_TickCounterFrequencyGet ( void )
{
    return SYS_TIME_FrequencyGet()/SYS_COUNT_TICK_CONV;
}

My SYS_Time_Frequency is 40MHz, hence SYS_TMR_TickCounterFrequency is defined as 40,000,000 / 150,000 = 266

I understand the intention of the rounding, but my concern is that the SYS_TMR_TickCounterFrequency value is not actually related to how the task "ticks" are generated.

SYS_TMR_CallbackPeriodic() does not specify a number of ticks, rather it specifies a number of milliseconds. Therefore, I think that stackTaskRate should be calculated using milliseconds, not using ticks.

Obviously, if SYS_COUNT_TICK_CONV was 1000 then 1 SYS_TMR tick = 1 millisecond and there would be no difference. But if SYS_COUNT_TICK_CONV is not 1000, this anomaly occurs.

dchirgwin commented 4 years ago

I've just seen what may be the critical line of code:

You mention a call to SYS_TMR_CallbackPeriodicSetRate() to adjust the result. In my generated code, this call is commented out, so the adjustment is never made.

adrian-aur commented 4 years ago

The call to the SYS_TMR_CallbackPeriodicSetRate() is commented out as the H3 timer no longer supports it. So it's the same in my code, no problem here.

There seems to be a disconnect between the setting of the timer and the result of the SYS_TMR_TickCounterFrequencyGet(). It should match the settings on your system. So practically that "hard coded" value SYS_COUNT_TICK_CONV == 150000 is wrong. I'll have to investigate to understand how that value is generated and if it should adjust itself depending on the settings. For now please adjust it manually so that you get the correct timing, until I get an answer.

adrian-aur commented 4 years ago

The issue has been already corrected and it will be part of the next release. The sys_time_h2_adapter should not use the hard coded SYS_COUNT_TICK_CONV but the value returned by the call SYS_TIME_MSToCount(1). Please give it a try, it should fix your issue.

Of course the 2nd issue remains (the one caused by the late initialization of the system timer and wrong values for the module timeouts). I can send you a quick snapshot of the fix, if you want to implement it yourself until the release is available.

dchirgwin commented 4 years ago

Thanks for the update. I'm pleased and impressed you found the "smoking gun" - it's horrendously complex stuff around there! Regarding the fix, I've got some workarounds in place so I'm happy to wait for the next release, but thanks for the offer.

adrian-aur commented 4 years ago

Some of the complexity is caused by the need of signaling of RX/TX and TMO events, and running under an RTOS but also bare metal. Regarding the timing itself, I agree that it should be updated to use directly the H3 timing scheme, ms based, as the tick scheme is no longer present. I'll create an internal issue for this update. It won't be probably part of the next release, but of a future one. Thank you for reporting these issues.