Closed codecubepi closed 3 months ago
I tried switching all the usec
times in the scheduler to from uint32_t
to double
type, but I'm still seeing jitter in the frequencies of my 3-phase currents as previously observed. Also, now when I am running in the new mode with all sensors disabled, my currents all become constant (ie switching has frozen).
Hmm. Ok.
The currents will stay frozen if the control task stops running, and so the FPGA holds the previous duty cycle outputs.
For debugging the jitter: can you use the approach in the advanced tutorial on task profiling to directly measure the callback loop time? Store it to a log variable and make a plot of it during operation. This will be very enlightening as to why the output fundamental currents are behaving so strangely.
It looks good, the loop time is hovering right around 100us, or a control frequency of 10 kHz.
The loop time is now jumping between 100us and 200us. The currents I see on the scope are also 300 Hz instead of the commanded 600 Hz.
Additionally, after it's been running well in the new mode, I can still freeze the control task by disabling the sensors. I've been looking through the Timing Manager FPGA code, and it seems like it should still be trying to generate the interrupt, so I'm guessing this problem is also somewhere in the C code.
Hmm interesting. The 300 Hz vs 600 Hz case makes sense: since the control code is running at about 2x slower in the second plots, the theta
variable is updated 2x slower, so the output frequency becomes 1/2.
However, the question is still: why does enabling the AMDS cause it to "skip" a time slice for the task.
For next steps, I recommend looking into the scheduler and trying to understand why the skip is happening. I would log variables from the scheduler itself. For standard 10 kHz / 100 kHz settings, you should find that every time the scheduler is "un-paused" via the timing manager ISR, the elapsed_usec
should increment by 100 usec and trigger the task to run again. It looks like it increments by 99.999 microseconds, which does not pass the check... so it skips the task and then runs it the next slice, so 200 usec. Eventually, it wraps around and then runs the task as desired (due to the phasing), so you see the 100 usec delta.
Can you log this from the scheduler and show this?
So, really, you want to dump this function call output into a global variable for logging:
I bet the usec it increments is 99.9999 usec, instead of 100.
It looks like you're correct. I'm logging the tick_delta
within scheduler_tick()
, and these are the results I get:
Adding the following subtraction to the check in scheduler_run()
removes the jitter we were seeing on the scope, since the task is being scheduled for execution at the correct loop frequency now...
I'm not sure that this "naive" subtraction solution is the best though. I'm open to other suggestions to fix.
Wow nice work! It looks like this is really a floating point rounding issue / FPGA timer resolution issue. I like your solution, but would implement it something like:
if (is_close(usec_since_last_run, t->interval_usec, SCHEDULER_INTERVAL_THRESHOLD)) {
// ... run task
}
where is_close(a,b,tol)
implements the standard floating point "near-ness" comparison, i.e., see https://numpy.org/doc/stable/reference/generated/numpy.isclose.html
per the docs above: absolute(a - b) <= (atol + rtol * absolute(b))
which includes both absolute and relative tolerances.
I like that it works with a 50ns tolerance... I think this is totally acceptable to have this amount of tolerance.
TODO: we need to justify this 50ns value... why not 60ns? 40ns? 100ns? I think this can come down to the resolution of the FPGA timer, i.e., 5ns, and a factor of X. We just need to pick X (like your X=10 for 50ns), then run some tests over range of scheduler settings to ensure it works well. Maybe add a check in the tick function to make sure the issue isn't occurring.
In the data you collected, what is the max deviation from the expected 100 usec value? In your screenshot, it looks like 20ns. But, I would want to quantify this over many more samples... :)
@npetersen2
Yeah when I look for the max deviation over all my samples, I get 20ns.
So... our implementation of is_close()
should have both an absolute and relative tolerance? Or just the SCHEDULER_INTERVAL_THRESHOLD
?
Also, the version of is_close()
in numpy will check for only return true, pass the check, and run the task if the usec_since_last_run
and task interval are close (obviously). In the current implementation, the task will still run even if the usec_since_last_run
is much greater than the interval. Is this an okay change to the current behavior?
Ah good point. I think it should just be an absolute tolerance of some factor of 20ns, maybe 60ns. And also run tasks with greater than the interval time (like the current code)… so this is exactly the code you wrote above :)
Please just make the code very clear about this and explain why we are adding this fudge factor. And make the factor a define or const etc, no magic numbers
@npetersen2
The jitter issue seems fixed by adding the timing tolerance. Let me know if you think these comments are good enough:
However, I am still dealing with the currents freezing if I disable the sensors in the timing manager while running in the new mode.
Stepping through the debugger, I can tell that the control task callback is being executed, and the duty ratios are being updated and written via the pwm_set_duty_raw()
function. So I am really not sure why I see no change on the scope.
Does the PWM duty cycle flush the output correctly when in the new mode, and when no sensors are enabled?
It sounds like everything appears to work correctly, but the duty cycle update gets lost after writing to the fpga registers?
No.. it seems like the new mode refuses to update the duty ratios regardless of whether or not sensors are enabled. The blink task still works. And yeah, everything with the scheduler, control callback, and register write appear to work. Just that the duty cycle update gets lost after writing to the fpga registers.
And I've discovered a new problem... sometimes the logging command interface will fail with INVALID_ARGUMENTS
, and sometimes it works just fine:
I'm not sure if this is also because of the conversion of times from uint32_t
to double
or what.
Yea, I have seen this a lot myself. I am not sure it is just with the log
commands---I think it might be with the entire command processing pipeline... some lurking bug that, every 50-100 commands, it fails with invalid arguments... I have not ever bothered to debug it... but would be fairly easy to spam the AMDC with commands in python and just put a breakpoint in the command handler return of invalid, then take a look at the argv
/argc
variables and see why/how they look funny. Or, just ignore it for now
The problem with the duty ratios no longer updating is due to the following faulty logic:
trigger
is received from the timing managertrigger
to be asserted, all sensors must be done sampling (all_done
in the timing manager is high)all_done
was qualified on sensors_enabled
, thus if all sensors are disabled, all_done
is never high and no trigger
will ever generate, and no duty ratios will be flushed out.The solution was to remove all_done
's reliance on sensors enabled. This can be done safely now that all sensors are done
upon reset.
Control task running with new timing manager mode (which runs the scheduler ISR when all sensors are done).
For the first half of the time, no sensors are enabled (which is the same as the "legacy" behaviour). Halfway through, the AMDS is enabled in the timing manger.
The plot shows loop time for 100 kHz switching and 10 kHz control. When the AMDS is enabled, the loop time jumps from 100us to 110us for a single cycle because the timing manager is now waiting for the AMDS to complete sampling. All future loop times return to the expected 100us.
We also found this weird plot of the control task run time decreasing when the AMDS is enabled in the timing manager:
Plots of latest issue: disabling and re-enabling sensors during operation leads to inconsistent scheduler_tick()
loop time:
Hey @codecubepi , the fix for logging was in common/main.c
. The timing_manager_init()
needs to come before user_apps_init()
, so that any timing settings set by the user (PWM sw frew, sync. TM ratio, etc) are not overwritten by the defaults.
Thanks @anirudhupadhyaya !
Wrapping up this issue's conversation...
With the control task jitter and general scheduler/logger bugs resolved, I'm going to close this issue.
For the problem with the sensor enabling/disabling leading to inconsistent loop time, see #402. I think the scope of that is big enough to warrant its own issue. And I'm tired of scrolling through this conversation.
There is some frequency jitter when using the new timing manager mode.
The theory is that the problem lies in the truncation of the elapsed time value returned from the
timing_manager_get_tick_delta()
function which is called inscheduler_run()
.The possible solution is to track all times in
double
.We should also add a function to get the expected tick delta based on the enabled sensors and user ratio, other timing manager configurations, etc.
Latest updates (20 Jun) here to minimize scrolling:
402
INVALID_ARGUMENTS
expected_tick_delta()
function has resolved this? Anirudh hasn't had anymore issues with logging.double
and which can remainuint32_t