WellDone / MoMo-Firmware

Embedded PIC Firmware and hardware designs for the WellDone Mobile Monitor (MoMo).
welldone.org
Other
8 stars 6 forks source link

construct_report appears to corrupt memory #101

Closed timburke closed 10 years ago

timburke commented 10 years ago

I've reproduced the reporting issue somewhat reliably and narrowed the problem down to construct_report. It appears to be corrupting memory, as can be seen from the following system log:

Debug (2:12:38) Constructing report...
Critical (2:12:45) Taskloop full, resetting the PIC.
Critical (2:12:45) 0
Critical (2:12:45) 0
Critical (2:12:45) 0
Critical (2:12:45) Device reset.

The three numbers that are logged should be the start pointer, end pointer and queue length from the taskqueue. They are all somehow zero. This causes the task queue to think that it's full and reset the device. Many other things are probably also being overwritten.

Replacing construct report with a dummy version that just base64 encodes the header without using the aggregate functionality does seems to make the problem go away.

Note that the taskloop full log entry comes immediately after 'constructing report'. This indicates that the call to construct_report does not complete successfully since the code path after it calls another log routine in the case of both success and failure.

amcgee commented 10 years ago

Good catch! I will track this down as soon as I'm back at a computer. On Jun 18, 2014 9:24 AM, "timburke" notifications@github.com wrote:

I've reproduced the reporting issue somewhat reliably and narrowed the problem down to construct_report. It appears to be corrupting memory, as can be seen from the following system log:

Debug (2:12:38) Constructing report... Critical (2:12:45) Taskloop full, resetting the PIC. Critical (2:12:45) 0 Critical (2:12:45) 0 Critical (2:12:45) 0 Critical (2:12:45) Device reset.

The three numbers that are logged should be the start pointer, end pointer and queue length from the taskqueue. They are all somehow zero. This causes the task queue to think that it's full and reset the device. Many other things are probably also being overwritten.

Replacing construct report with a dummy version that just base64 encodes the header without using the aggregate functionality does seems to make the problem go away.

Note that the taskloop full log entry comes immediately after 'constructing report'. This indicates that the call to construct_report does not complete successfully since the code path after it calls another log routine in the case of both success and failure.

— Reply to this email directly or view it on GitHub https://github.com/WellDone/MoMo-Firmware/issues/101.

timburke commented 10 years ago

Updates to this. There are multiple issues that I am in the process of fixing:

Task loop contents (ringbuffer size: 32)
12702: _flush_log
13066: _bus_slave_callback
10956: _scheduler_callback x 30
amcgee commented 10 years ago

Tim, can you share the branch you're working on with me so I can collaborate?

On Sat, Jun 21, 2014 at 8:46 PM, timburke notifications@github.com wrote:

Updates to this. There are multiple issues that I am in the process of fixing:

  • FIXED RTCC stops when running on the external crystal for an unknown reason (fix is to switch to the internal RC oscillator
  • FIXED When a single task takes many seconds, the RTCC interrupt (which triggers at half second intervals when momo config is waiting to be flushed to flash) will fill up the taskloop with the same callback. See taskloop contents below:

Task loop contents (ringbuffer size: 32) 12702: _flush_log 13066: _bus_slave_callback 10956: _scheduler_callback x 30

  • construct_report has an algorithmic complexity problem and takes > 9 seconds to run when there are many events to process. This is due to the slow timestamp calculation.
  • construct_report also corrupts memory. I have rewritten it to not need pointer math and to use a faster timestamp generation routine but these changes need to be tested before I open a PR.

— Reply to this email directly or view it on GitHub https://github.com/WellDone/MoMo-Firmware/issues/101#issuecomment-46760060 .

Austin McGee austinwm@gmail.com (510) 200-8486

timburke commented 10 years ago

Created a PR #98 to share the branch On 6/23/14, 12:45 AM, Austin McGee wrote:

Tim, can you share the branch you're working on with me so I can collaborate?

On Sat, Jun 21, 2014 at 8:46 PM, timburke notifications@github.com wrote:

Updates to this. There are multiple issues that I am in the process of fixing:

  • FIXED RTCC stops when running on the external crystal for an unknown reason (fix is to switch to the internal RC oscillator
  • FIXED When a single task takes many seconds, the RTCC interrupt (which triggers at half second intervals when momo config is waiting to be flushed to flash) will fill up the taskloop with the same callback. See taskloop contents below:

Task loop contents (ringbuffer size: 32) 12702: _flush_log 13066: _bus_slave_callback 10956: _scheduler_callback x 30

  • construct_report has an algorithmic complexity problem and takes > 9 seconds to run when there are many events to process. This is due to the slow timestamp calculation.
  • construct_report also corrupts memory. I have rewritten it to not need pointer math and to use a faster timestamp generation routine but these changes need to be tested before I open a PR.

— Reply to this email directly or view it on GitHub

https://github.com/WellDone/MoMo-Firmware/issues/101#issuecomment-46760060 .

Austin McGee austinwm@gmail.com (510) 200-8486

— Reply to this email directly or view it on GitHub https://github.com/WellDone/MoMo-Firmware/issues/101#issuecomment-46814584.

timburke commented 10 years ago

This problem has been fixed and integrated into dev.