Open projectgus opened 8 years ago
Happy to start working on this. Sorry for the long note--I'm not super familiar with this stuff and I'm being extra careful to explain myself to make sure I understand correctly.
I noticed that http://git.savannah.gnu.org/cgit/lwip.git/tree/doc/sys_arch.txt#n202 explicitly states that sys_now() is supposed to return miliseconds.
Looking at the rest of that file, there are the following time-related functions [edited to remove earlier mistakes]:
sys_arch_sem_wait
is also supposed to take a timeout param in milliseconds, and does so.
sys_arch_mbox_fetch
is the same.
I didn't see any other functions in sys_arch that dealt with timeouts.
Looking closer at uses of sys_now
, I found:
static inline uint32_t sys_now(void)
as NOW()/(TIMER_CLK_FREQ/1000);
. I don't see where the NOW()
macro is defined, but assuming it is the same as tick count, that is returning ms. This function has a different type than the one in sys.arch.c, and I believe that in most places in lwip, the function from sys_arch.c, not this one, is being used.sys_now
for most or all of its time and timediff needs. It uses sys_now
to initialise timeouts_last_time
(and reset it in restart_timeouts
), and in sys_check_timeouts
it is used to calculate the diff when determining if timeouts have expired.SNMP_GET_SYSUPTIME(sysuptime) (sysuptime = (sys_now() / 10))
, which would be reporting uptime in hundredths of a second if sys_now
reported ms.sys_now
is used to compare the amount of time a connection has been writing data against its timeout.sys_now
units ms.I've submitted https://github.com/SuperHouse/esp-open-rtos/pull/134 to start the ball rolling on a fix.
I'm not certain I can reproduce this bug. Using the current master (4b39a0) I modified the SNTP example as follows, to use LWIP select() instead of delay():
diff --git a/examples/sntp/sntp_example.c b/examples/sntp/sntp_example.c
index 242489b..61292ca 100644
--- a/examples/sntp/sntp_example.c
+++ b/examples/sntp/sntp_example.c
@@ -54,7 +54,14 @@ void sntp_tsk(void *pvParameters)
/* Print date and time each 5 seconds */
while(1) {
- vTaskDelayMs(5000);
+
+ struct timeval tv = { .tv_sec = 5 };
+ uint32_t start = xTaskGetTickCount();
+ printf("Pausing 5(?) seconds...\n");
+ select(0, NULL, NULL, NULL, &tv);
+ uint32_t end = xTaskGetTickCount();
+ printf("Took %d ticks.\n", end - start);
+
time_t ts = time(NULL);
printf("TIME: %s", ctime(&ts));
}
The output suggests it's sleeping for 5 seconds each time I call select:
TIME: Sat May 28 06:00:13 2016
Pausing 5(?) seconds...
Took 500 ticks.
TIME: Sat May 28 06:00:18 2016
Pausing 5(?) seconds...
Took 500 ticks.
Verified with a stopwatch that each pause took 5 seconds, also.
Looking again at esp_mqtt_read() in MQTTESP8266.c, it takes an argument of milliseconds and then passes it to the tv_sec field of a timeval - which makes me think the actual bug was sleeping for X seconds instead of X milliseconds.
However, #134 also looks like a correct fix so we should probably merge that anyhow!
(This is a good candidate for a "first bug" if someone is interested in contributing but unsure where to start.)
Bug was noted in #91 when using select() - the timeout argument to select appears to take RTOS ticks instead of seconds.
The implementation is in lwip/sys_arch.c and the bug may be as simple as the sys_now() function returning tick count instead of tick count.
However, before applying a fix someone should verify this and the other time-related functions in sys_arch.c.
After apply the fix, check the codebase for uses of select() or other time-sensitive LWIP functions.