Moddable-OpenSource / moddable

Tools for developers to create truly open IoT products using standard JavaScript on low cost microcontrollers.
http://www.moddable.com
1.34k stars 237 forks source link

xsbug protocol does not start up reliably on Linux #1016

Closed tve closed 1 year ago

tve commented 1 year ago

Build environment: Linux, Arch Target device: NodeMCU board w/esp32

Description When serial2xsbug starts up it frequently hangs after set-all-breakpoints.

Steps to Reproduce

  1. Using the chapter 1 host from the IoT book (iot-product-dev-book/ch1-gettingstarted/host) build and load the host using mcconfig -d -m -p esp32/nodemcu
  2. Observe xsbug starting and (hopefully) the expected "Device flashed..." log line.
  3. In serial2xsbug.h enable mxTrace and mxTraceCommands
  4. Run just serial2xsbug several times, for me it hangs 2-3 times out of 10, sometimes more consistently, adjust the commandline for your port: BUG_PORT=5002 XSBUG_HOST=localhost serial2xsbug /dev/ttyUSB0 460800 8N1 (you can also re-run mcconfig, it just takes longer...)
  5. Output when serial2xsbug hangs:
    
    > SBUG_PORT=5002 XSBUG_HOST=localhost serial2xsbug /dev/ttyUSB0 460800 8N1
    ### fxRestart
    xxxxxxxxxxxxxxxxxxxxxx<?xs-00000000?>

<?xs.3FFB4264?>

set time

<?xs#3FFB4264?> <?xs.3FFB4264?>

<?xs-3FFB4264?>

(There is about a 1 second delay between the `<set-all-breakpoints...` line and the very last line.)

**Expected behavior**
A successful start-up:

SBUG_PORT=5002 XSBUG_HOST=localhost serial2xsbug /de v/ttyUSB0 460800 8N1

fxRestart

xxxxxxxxxxxxxxxxxxxxxxx<?xs-00000000?>

<?xs.3FFB4264?>

set time

<?xs#3FFB4264?>
<?xs.3FFB4264?>

<?xs.3FFB4264?>

<?xs.3FFB4264?>

Device flashed. Ready to install apps.

** Additional observations**

After some more troubleshooting, I see that the `<?xs.3FFB4264?>` comes from the MCU and that the `<xsbug><instruments><instrument name="Pixels drawn" ...` is also supposed to come from the MCU, but doesn't when the process hangs. I'm concluding that it's the code in the MCU that crashes.

Trying to install a mod hangs for me as well (sometimes it succeeds):

XSBUG_PORT=5002 XSBUG_HOST=localhost /home/sw/moddable/moddable/build/bin/lin/release/serial2xsbug /dev/ttyUSB0 460800 8N1 -install /home/sw/moddable/moddable/build/bin/esp32/debug/test-sx1276/test-sx1276.xsa

fxRestart

xxxxxxxxxxxxxxxxxxxxxxxxxxmod failed x <?xs-00000000?>

<?xs.3FFB4264?>

set time

<?xs#3FFB4264?>### install

<?xs#3FFB4264?> <?xs-3FFB4264?>


The "mod failed" fragment right after the "xxx" seems suspicious.
phoddie commented 1 year ago

Thank you for the report.

I'm concluding that it's the code in the MCU that crashes

That is contradicted by the <?xs-3FFB4264?> from the MCU in the trace.

phoddie commented 1 year ago

It looks more like the MCU didn't receive the command from the Linux device. It is timing out waiting for the initial set-up to complete. An unreliable serial connection would do that. You could try stepping back to a slower baud rate Try setting

DEBUGGER_SPEED=115200 mcconfig...
tve commented 1 year ago

I see, so you assume that the last <?xs-3FFB4264?> is a retry because the <set-all-breakpoints></set-all-breakpoints> wasn't properly received.

I did some more troubleshooting in the meantime and when it hangs, in fxDescribeInstrumentation https://github.com/Moddable-OpenSource/moddable/blob/public/xs/sources/xsDebug.c#L2339 fxIsConnected() returns false, that's why <xsbug><instruments><instrument ... isn't being sent by the MCU.

I will try a lower serial rate. I had tried a completely different board with a (genuine) FTDI adapter (whereas the NodeMCU uses a cp210x). Thanks for the suggestions!

Quick update: @115220 I see the same hanging.

phoddie commented 1 year ago

so you assume that the last <?xs-3FFB4264?> is a retry

It is not a retry. That is a disconnect.

If the MCU does not receive a valid command on the xsbug connection within 2 seconds of sending the <login it concludes no debugger is available and continues without it. The disconnect message indicates that no valid command was received.

tve commented 1 year ago

Ah, thanks for the clarification! (If there's a doc I should read I'd love to.) I had written down that the - after xs results in closing the network but then overlooked it. That explains why the->connection is falsy by the time fxDescribeInstrumentation comes around. OK, so I need to figure out what the MCU is expecting in response to the login and why it's not receiving that...

phoddie commented 1 year ago

...I need to figure out what the MCU is expecting in response to the login and why it's not receiving that...

It is expecting any valid command. Such as <set-all-breakpoints></set-all-breakpoints>. Perhaps your computer is taking longer than 2 seconds to send that?

tve commented 1 year ago

Perhaps your computer is taking longer than 2 seconds to send that?

It happens instantaneously.

tve commented 1 year ago

The plot thickens... I added a buffer to capture what fxReceiveLoop is reading. I'm printing it when fxReceive exits surrounded by {..} and with control characters shown as ~hex. Here's a successful start-up:

entry 0x400805d0                                                    
<?xs-00000000?>                                                     

<?xs.3FFB436C?>                                                     
<xsbug><login name="main" value="XS"/></xsbug>                      
### set time                                                        

<?xs#3FFB436C?>                                                     
<?xs.3FFB436C?>                                                     
<set-all-breakpoints></set-all-breakpoints>                         
{~0D~0A<?xs#3FFB436C?>~00~0F~09~00~00c~D72~18~FF~FF~8F~80~00~00~00~
00~0D~0A<?xs.3FFB436C?>~0D~0A<set-all-breakpoints></set-all-breakpoi
nts>~0D~0A}                                                         
<?xs.3FFB436C?>                                                        
<xsbug><log>DescribeInstrumentation&#10;</log></xsbug>              

<?xs.3FFB436C?>                                                     
<xsbug><log>Connected&#10;</log></xsbug>                            

<?xs.3FFB436C?>                                                     
<xsbug><instruments><instrument name="Pixels drawn" value
=" pixels"/><instrument name="Frames drawn" value=" frame
s"/><instrument name="Network bytes read" value=" bytes"/
...

Here's a failed start-up:

entry 0x400805d0
<?xs-00000000?>

<?xs.3FFB436C?>
<xsbug><login name="main" value="XS"/></xsbug>
### set time

<?xs#3FFB436C?>
<?xs.3FFB436C?>
<set-all-breakpoints></set-all-breakpoints>

<?xs-3FFB436C?>
{~0A<?XS#3FFB436C?>~00~0F ~00~00C~D73I~DF~DF~8F~80~00~00~00~00~0A~0A
<?XS.3FFB436C?>~0A~0A<SET-ALL-BREAKPOINTS></SET-ALL-BREAKPOINTS>~0A~0A}DescribeInstrumentationNotConnected^C⏎              

... nope, those CAPS are not my editing...

BTW, what does the "fx" prefix on all the function in serial2xsbug stand for?

phoddie commented 1 year ago

That's really strange. Is it possible there is some configuration on the Linux serial driver you are using that is performing case conversion? The code in serial2xsbug for that is here.

BTW, what does the "fx" prefix on all the function in serial2xsbug stand for?

Function. XS coding style.

tve commented 1 year ago

Yeah, I'm trying to understand fxOpenSerial... The first thing that stood out is that everywhere I read they start with tcgetattr to init the termios struct before modifying and setting. https://stackoverflow.com/a/6947758/3807231 even claims you get erratic behavior if you don't. I don't know I believe that, but I am observing erratic behavior...

Unfortunately, just starting with tcgetattr doesn't work 'cause some flags that need to be cleared aren't.

I'm trying to ascertain that the code wants a "completely raw" connection and will then try some sample code from the interwebs... (I don't have linux serial programming experience other than hacking around when it doesn't work.)

phoddie commented 1 year ago

. (I don't have linux serial programming experience other than hacking around when it doesn't work.)

That's where most everyone seems to be. It is a black art. Good luck. Looking forward to what you discover.

tve commented 1 year ago

OK, erratic behavior explained... Looking at fxOpenSerial:

In addition, there are several fields that set the speed and the recommended portable method is to use cfsetspeed() (see the man page).

Also, there is a section in the second half of the function to set a non-standard speed (the if (speed == BOTHER) section) but:

Right now it seems to work reliably for me. I'm leaning towards a PR that makes three changes:

Thoughts?

phoddie commented 1 year ago

Thanks for the analysis and PR. It more-or-less makes sense (given a limited understanding of every detail). We'll review.