Closed kodsurf closed 1 year ago
With 0.5 Hz manually hardcoded during sequence initialization
log :
After sent SET request, thruster replies.
With 1Hz Frequency
There are no Thruster Reply in between SET requests
Very strange actually ....
If I set wait time between set requests to 1.1s instead of 1 , then thruster replies stable
I would ask ENPULSION engineers about it .....
According to ENPULSION documentation - ramp should be 1Hz 30sec duration
Need to clarify from them whenever different frequency and different duration of ramp are acceptable .
If yes, then its very good :)
But we still have to understand WHY it does not work with wait time <1.1s
I investigated with that a little further.
If SET/READ requests are send with high frequency or without delay in sequencer.
Debugger says that HardFault_handler() is caused by deb_uartIRQ()
@RobertK66 Lets please investigate into that and try to work out solution on how multiple paralel proccesses in sequencer should run without disturbing each other.
Before I commit sequencer code - I make sure that all preprogrammed sequences are able to run in parallel without causing hard fault. But this is purely based on observation - I have no idea what causes this hard fault.
With thruster in particular - we dont need to run multiple sequences at the same time. But according to my plan - OBC should be able to handle lots of parallel sequences at the same time!
Is the branch feature/thruster-fire-paralel the correct place for me to look into this!?
Can I reproduce this? Do you have some tool yet to connect to the Thruster UART, or do you need enpulsion hardware to be connected to reproduce this !?
How to reproduce it?
Yes you can reproduce it without thruster.
I will intentionally make commit with hardcoded sequence that is meant to cause hard fault
In around 1-2h
ok, have you got a value for your LAST_STARTED_MODULE variable when you hit the breakpoint in your picture!?
and what could be intersting is the call stack in the upper right part of the debug window. Maybe you see the line number where the last code was executed. The Chip_Uart_ReadLineStatus() was in PC (program counter) not sure what the deb_UartIRQ() in the LR register really means....
Does it always look like this, or is it more or less random?
https://github.com/RobertK66/obc_1769_core/commit/529aacdd02739f398491ad262d2428c6e9a383e9
I made commit with sequence that is intended to cause failure
"8 2 0" - triggers sequence with too small delay time between thruster requests
Triggering this sequence will eventually cause hard fault. Triggering "8 0 0" and "8 1 0" will cause it even faster.
STRANGELY THAT I NOTICED -
If I turn off power supply to thruster sim board - sequences seems to be running good
"Does it always look like this, or is it more or less random?"
For now I stopped getting hard_fault in debbuger but OBC still "stucks"
Yesterday I got consistent hardfault on IRQ. today I cant reproduce the same hardfault ...
LAST_STARTED_MODULE before hard fault loop during sending too frequent SET/READ requests to thruster is 106
P.S MCU expreso debugger also says that problem with debug_uartIRQ()
ok, did you get a line number? because the deb_uartIRQ() never calls the Chip_Uart_ReadLineStatus() which is mentioned as PC (program counter) in your screenshot !?
What I do suspect here is, that 'somebody' somehow overwrites the content of the module variable deb_uart* This variable is only initialized once when the debug module is initialized.
What I also found out here is that I used sometimes the pUart (coming from IRQ call) (line 341, 345) and sometimes the deb_uart variable (line 351, 360). This is not clean code but normally this variables should have the same (uartPointer) content.
Maybe you could write something like
if (pUart != deb_uart) {
deb_uart = pUart;
}
at the first line of the IRQ - make a breakpoint if this gets hit and then my suspicion could be proofed....
As this is pure C and a lot of stuff is already coded and running at the same time, this is not easy to debug. If the above "if" shows that the deb_uart variable is overwritten, somewhere in the code a 'runaway pointer' (or a statement with wrong */& operators) gets executed. ......
I hope I will find some time next week to make more debug effort on my side. I will try to reproduce and help here, but pls. - as always - do notify here if you already made some advances in your analyses. Thx.
In the log we see that it stopped during printing characters. "Value corrected according to goal" is a full string
One more remark
If I increase RAMP frequency to 2Hz (meaning that wait between SET requests is only 500ms ) - Ramp continues working for a while
When I say that "ramp is working" I mean that "thruster replies something"
If I look closer to what thruster replies I see that it is error message.:
Then ... After running for a while with high frequency. Thruster stops replying anything at all.
Interesting thing that !!!!!! WITHOUT STOPING THE RAMP SEQUENCE (requests are still send out) I can reset thruster board by turning it off and on.
After thruster board reset - It continue to reply
And sometimes it replies even with OK message
Thus -it might be also HW issue on thruster board side ? That HW is not able to handle so high frequency.
But when I asked enpulsion engineers - they say even 2 Hz should be fine.
Also I want to remind that according to ENPULSION documentation defination - ramp should be 1Hz frequency. Right now OBC can handle only 0.5 Hz
Ok I think this later one (resync com with thruster by power off/on thruster) is a different beast. As you describe this, in this case the OBC has no hard fault at all. So I would suggest - at some point - to make specific issues. E.g. this one can only be tackled with a thruster connected, while the first one (OBC hard fault - somewhere in UART IRQ) seems to be unrelated.
I had no time yet, to review your code, but from your debug screens I get the feeling that maybe somewhere in my UART module in ADO the IRQ handling of the thruster/debug uarts gets corrupted. As I see your tests there is a lot of Tx on the debug uart (your outputs) and at the same time your sequencer does rx/tx on the other UART. So as said above I will try to reproduce something here with OBC only.
Regarding IRQ debugging. This is the 'high level' of debug skills. Your LAST_STARTED_MODULE is a very good starting point here. What also can help is , maybe to get IO lines to show IRQ routine start end endpoint, and then maybe a trigger line on the hard fault, but let's see if I find something - my plan now is to look into it today - evening or Monday...)
I start gettin this sysevent before hardfault
Source of hard fault is ramp function bad defination of exit conditions
I just finished the review of your code in #51. I am not sure I understood everything you coded in your sequencer there but as it seems to work at this stage to some extend I would propose to go on there (see my review comments) make some minor changes/fixes merge it to the develop branch and then have some design discussion on next steps and improvements after that....
https://github.com/RobertK66/obc_1769_core/commit/053ff9aafdf4364f82b8494b53a73b7665a5cc01
I added functions that combine SET request and wait.
I used switch case and substage index to implement separate SET and WAIT stages within same function
void l4_ReadAllRegistersAndWait_sequence(l4_stage_arguments_t *stage_args)
What I noticed that - if I use switch case for substage - and frequence of requests are too high - NO REPLY FROM THRUSTER RECEIVED !
This is exactly the same problem as with ramp function. Ramp also uses substages with switch case.
According to ENPULSION documentation for RAMP it is required to achive the setpoint within 30s duration.
Ramp value at thruster register have to be updated with 1Hz frequency.
Frequency for the ramp in sequence execution hardcoding is defined with dt (argv[4])
Problem is that with 1Hz Thruster does not reply anything back. Looks like to high frequency. With 0.5 Hz (once every 2 sec) ramp works fine.
I believe problem is somewhere with sequence execution module. But not sure. Might also be something else