Bug: Time between thruster SET requests

kodsurf commented 1 year ago

According to ENPULSION documentation for RAMP it is required to achive the setpoint within 30s duration.

Ramp value at thruster register have to be updated with 1Hz frequency.

Frequency for the ramp in sequence execution hardcoding is defined with dt (argv[4])

Problem is that with 1Hz Thruster does not reply anything back. Looks like to high frequency. With 0.5 Hz (once every 2 sec) ramp works fine.

I believe problem is somewhere with sequence execution module. But not sure. Might also be something else

kodsurf commented 1 year ago

With 0.5 Hz manually hardcoded during sequence initialization

log :

After sent SET request, thruster replies.

With 1Hz Frequency

There are no Thruster Reply in between SET requests

kodsurf commented 1 year ago

Very strange actually ....

If I set wait time between set requests to 1.1s instead of 1 , then thruster replies stable

kodsurf commented 1 year ago

I would ask ENPULSION engineers about it .....

According to ENPULSION documentation - ramp should be 1Hz 30sec duration

Need to clarify from them whenever different frequency and different duration of ramp are acceptable .

If yes, then its very good :)

But we still have to understand WHY it does not work with wait time <1.1s

kodsurf commented 1 year ago

I investigated with that a little further.

If SET/READ requests are send with high frequency or without delay in sequencer.

Debugger says that HardFault_handler() is caused by deb_uartIRQ()

@RobertK66 Lets please investigate into that and try to work out solution on how multiple paralel proccesses in sequencer should run without disturbing each other.

Before I commit sequencer code - I make sure that all preprogrammed sequences are able to run in parallel without causing hard fault. But this is purely based on observation - I have no idea what causes this hard fault.

With thruster in particular - we dont need to run multiple sequences at the same time. But according to my plan - OBC should be able to handle lots of parallel sequences at the same time!

RobertK66 commented 1 year ago

Is the branch feature/thruster-fire-paralel the correct place for me to look into this!?

Can I reproduce this? Do you have some tool yet to connect to the Thruster UART, or do you need enpulsion hardware to be connected to reproduce this !?

How to reproduce it?

kodsurf commented 1 year ago

Yes you can reproduce it without thruster.

I will intentionally make commit with hardcoded sequence that is meant to cause hard fault

In around 1-2h

RobertK66 commented 1 year ago

ok, have you got a value for your LAST_STARTED_MODULE variable when you hit the breakpoint in your picture!?

and what could be intersting is the call stack in the upper right part of the debug window. Maybe you see the line number where the last code was executed. The Chip_Uart_ReadLineStatus() was in PC (program counter) not sure what the deb_UartIRQ() in the LR register really means....

Does it always look like this, or is it more or less random?

kodsurf commented 1 year ago

https://github.com/RobertK66/obc_1769_core/commit/529aacdd02739f398491ad262d2428c6e9a383e9

I made commit with sequence that is intended to cause failure

"8 2 0" - triggers sequence with too small delay time between thruster requests

Triggering this sequence will eventually cause hard fault. Triggering "8 0 0" and "8 1 0" will cause it even faster.

STRANGELY THAT I NOTICED -

If I turn off power supply to thruster sim board - sequences seems to be running good

kodsurf commented 1 year ago

"Does it always look like this, or is it more or less random?"

For now I stopped getting hard_fault in debbuger but OBC still "stucks"

Yesterday I got consistent hardfault on IRQ. today I cant reproduce the same hardfault ...

kodsurf commented 1 year ago

LAST_STARTED_MODULE before hard fault loop during sending too frequent SET/READ requests to thruster is 106

P.S MCU expreso debugger also says that problem with debug_uartIRQ()

RobertK66 commented 1 year ago

ok, did you get a line number? because the deb_uartIRQ() never calls the Chip_Uart_ReadLineStatus() which is mentioned as PC (program counter) in your screenshot !?

What I do suspect here is, that 'somebody' somehow overwrites the content of the module variable deb_uart* This variable is only initialized once when the debug module is initialized.

What I also found out here is that I used sometimes the pUart (coming from IRQ call) (line 341, 345) and sometimes the deb_uart variable (line 351, 360). This is not clean code but normally this variables should have the same (uartPointer) content.

Maybe you could write something like

 if (pUart != deb_uart) { 
         deb_uart = pUart;
 }

at the first line of the IRQ - make a breakpoint if this gets hit and then my suspicion could be proofed....

As this is pure C and a lot of stuff is already coded and running at the same time, this is not easy to debug. If the above "if" shows that the deb_uart variable is overwritten, somewhere in the code a 'runaway pointer' (or a statement with wrong */& operators) gets executed. ......

I hope I will find some time next week to make more debug effort on my side. I will try to reproduce and help here, but pls. - as always - do notify here if you already made some advances in your analyses. Thx.

kodsurf commented 1 year ago

In the log we see that it stopped during printing characters. "Value corrected according to goal" is a full string

kodsurf commented 1 year ago

One more remark

If I increase RAMP frequency to 2Hz (meaning that wait between SET requests is only 500ms ) - Ramp continues working for a while

When I say that "ramp is working" I mean that "thruster replies something"

If I look closer to what thruster replies I see that it is error message.:

Then ... After running for a while with high frequency. Thruster stops replying anything at all.

Interesting thing that !!!!!! WITHOUT STOPING THE RAMP SEQUENCE (requests are still send out) I can reset thruster board by turning it off and on.

After thruster board reset - It continue to reply

And sometimes it replies even with OK message

Thus -it might be also HW issue on thruster board side ? That HW is not able to handle so high frequency.

But when I asked enpulsion engineers - they say even 2 Hz should be fine.

Also I want to remind that according to ENPULSION documentation defination - ramp should be 1Hz frequency. Right now OBC can handle only 0.5 Hz

RobertK66 commented 1 year ago

Ok I think this later one (resync com with thruster by power off/on thruster) is a different beast. As you describe this, in this case the OBC has no hard fault at all. So I would suggest - at some point - to make specific issues. E.g. this one can only be tackled with a thruster connected, while the first one (OBC hard fault - somewhere in UART IRQ) seems to be unrelated.

I had no time yet, to review your code, but from your debug screens I get the feeling that maybe somewhere in my UART module in ADO the IRQ handling of the thruster/debug uarts gets corrupted. As I see your tests there is a lot of Tx on the debug uart (your outputs) and at the same time your sequencer does rx/tx on the other UART. So as said above I will try to reproduce something here with OBC only.

Regarding IRQ debugging. This is the 'high level' of debug skills. Your LAST_STARTED_MODULE is a very good starting point here. What also can help is , maybe to get IO lines to show IRQ routine start end endpoint, and then maybe a trigger line on the hard fault, but let's see if I find something - my plan now is to look into it today - evening or Monday...)

kodsurf commented 1 year ago

I start gettin this sysevent before hardfault

kodsurf commented 1 year ago

Source of hard fault is ramp function bad defination of exit conditions

RobertK66 commented 1 year ago

I just finished the review of your code in #51. I am not sure I understood everything you coded in your sequencer there but as it seems to work at this stage to some extend I would propose to go on there (see my review comments) make some minor changes/fixes merge it to the develop branch and then have some design discussion on next steps and improvements after that....

kodsurf commented 1 year ago

https://github.com/RobertK66/obc_1769_core/commit/053ff9aafdf4364f82b8494b53a73b7665a5cc01

I added functions that combine SET request and wait.

I used switch case and substage index to implement separate SET and WAIT stages within same function

void l4_ReadAllRegistersAndWait_sequence(l4_stage_arguments_t *stage_args)

What I noticed that - if I use switch case for substage - and frequence of requests are too high - NO REPLY FROM THRUSTER RECEIVED !

This is exactly the same problem as with ramp function. Ramp also uses substages with switch case.

RobertK66 / obc_1769_core

Bug: Time between thruster SET requests #49