107-systems / 107-Arduino-MCP2515

Arduino library for controlling the MCP2515 in order to receive/transmit CAN frames.
https://107-systems.org
MIT License
90 stars 14 forks source link

Nano 33 BLE, Example locks up #53

Closed chalkytoast closed 2 years ago

chalkytoast commented 2 years ago

:bug: Bug Report

On a Nano 33 BLE with a MPC2515, the UAVCAN-Heartbeat-Publish example locks up. The problem appears to be related to locking and/or interrupts, as it hangs when trying to aquire a lock. I managed to get it work by commenting out the locking code and disabling the interrupt.

I'm not entirely certain if this is truely a software bug. Perhaps this could also be caused by a wiring issue or misconfiguration?

// CritSec-mbed.cpp
extern "C" void crit_sec_enter() {}
extern "C" void crit_sec_leave() {}

// UAVCAN-Heartbeat-Publish.ino in void setup()
pinMode(MKRCAN_MCP2515_INT_PIN, INPUT_PULLUP);
// attachInterrupt(digitalPinToInterrupt(MKRCAN_MCP2515_INT_PIN), onExternalEvent, FALLING);

Minimum configuration to replicate the bug

aentinger commented 2 years ago

Hi :wave: :coffee:

While I can't exclude that you might experience a locking-up due to the critical section code it seems implausible to me. At least you'd be the first to report that problem. I've got two questions for you:

chalkytoast commented 2 years ago

Hi, thanks for your help.

How did you connect the MCP2515 to the Nano 33 BLE? Which MCP2515 board did you use?

We have a custom PCB. Chip select is on pin 10, interrupt is on pin 9, 8 MHz clock, the bus is running at 500 kbps. We also wired up a breakout board (https://copperhilltech.com/mcp2515-can-bus-breakout-board-with-spi-interface/). As I remeber we also had the same problem with that setup, I have to check on monday.

Can you please run MCP2515-Loopback.ino on your Nano 33 BLE/MCP2515 combo and tell me if this works (i.e. share the output on Serial)?

When running the loopback example, something similar happens. I have modified the test loop as follows:

  std::for_each(CAN_TEST_FRAME_ARRAY.cbegin(),
                CAN_TEST_FRAME_ARRAY.cend(),
                [](sCanTestFrame const frame)
                {
                  if(!mcp2515.transmit(frame.id, frame.data, frame.len)) {
                    Serial.println("ERROR TX");
                  }
                  Serial.println("foo");
                  Serial.println("bar");
                  delay(10);
                  Serial.println("baz");
                });

The code execution freezes up after printing "foo" once. When commenting out attachInterrupt all tests run through, but obviously onReceiveBufferFull is not called.

To narrow down where the problem occurs I have littered MCP2515_io.cpp with prints, and also added prints to the select/deselect/transfer functions.

void select_()
{
  Serial.print(__func__); Serial.println(" begin");
  digitalWrite(MKRCAN_MCP2515_CS_PIN, LOW);
  Serial.print(__func__); Serial.println(" end");
}

void deselect_()
{
  Serial.print(__func__); Serial.println(" begin");
  digitalWrite(MKRCAN_MCP2515_CS_PIN, HIGH);
  Serial.print(__func__); Serial.println(" end");
}

uint8_t transfer_(const uint8_t d)
{
  Serial.print(__func__); Serial.println(" begin");
  uint8_t y = SPI.transfer(d);
  Serial.print(__func__); Serial.println(" end");
  return y;
}

ArduinoMCP2515 mcp2515(select_,
                       deselect_,
                       transfer_,
                       micros,
                       onReceiveBufferFull,
                       nullptr);

The output looks like this:

reset
select_ begin
select_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
modifyRegister
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
modifyRegister
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
modifyRegister
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
modifyRegister
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
modifyRegister
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
modifyRegister
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
modifyRegister
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
modifyRegister
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
writeRegister
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
writeRegister
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
writeRegister
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
modifyRegister
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
readRegister
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
loadTxBuffer
select_ begin
select_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
transfer_ begin
transfer_ end
deselect_ begin
deselect_ end
requestTx
select_ begin
select_ end
transfer_ begin
transfer_

So the last function that gets executed is requestTx, SPI.transfer(d); is the last IO call. This can be reproduced reliably. I'm at a loss, but maybe you can draw some conclusion from that.

I guess this means this ticket can be closed as my problem isn't caused by this library, but still I would be thankful for any insight you might have into this issue.

Thanks for your help.

chalkytoast commented 2 years ago

I did a little further digging. The Arduino docs on external interrupts say that ISRs should not use delay or do any serial IO, and in general should be kept short. The loopback example does serial IO in the ISR, but removing it didn't fix the problem. However, when I moved the onExternalEventHandler call out of the ISR into loop it no longer locks up.

static volatile int haveInterrupt;

// in setup()
attachInterrupt(digitalPinToInterrupt(MKRCAN_MCP2515_INT_PIN), [](){ haveInterrupt = 1; }, FALLING);

void loop()
{
  if (haveInterrupt) {
    haveInterrupt = 0;
    mcp2515.onExternalEventHandler();
  }
  delay(5);
}

Now the output is

[ 825952] ID 1 DATA[0] 
[ 827500] ID 2 DATA[4] CA FE CA FE

While the Arduino no longer locks up, the interrupt is only triggered once. I'm not sure if I have to set it manucally back to high. On monday I'll hook up a scope to the pin and see what's acutally going on.

chalkytoast commented 2 years ago

I have another update. It was pointed out to me that the interrupt will only be triggered once because all the messages are sent from setup.

After moving everything to loop I think I get the correct output.

void loop()
{
  static unsigned i = 0;
  if (i < CAN_TEST_FRAME_ARRAY.size()) {
    sCanTestFrame frame = CAN_TEST_FRAME_ARRAY[i++];
    if (!mcp2515.transmit(frame.id, frame.data, frame.len)) {
      Serial.println("ERROR TX");
    }
    delay(10);
  }

  if (haveInterrupt) {
    haveInterrupt = 0;
    mcp2515.onExternalEventHandler();
  }
  delay(5);
}

output:

[ 961474] ID 1 DATA[0] 
[ 977412] ID 2 DATA[4] CA FE CA FE 
[ 994348] ID 3 DATA[8] CA FE CA FE CA FE CA FE 
[ 1012288] ID 4 DATA[4] 0 0 0 3C 
[ 1029228] ID 7FF DATA[0] 
[ 1045165] ID(EXT) 80000000 DATA[0] 
[ 1061101] ID(EXT) 9FFFFFFF DATA[0] 
aentinger commented 2 years ago

Good debugging effort :+1:

The Arduino docs on external interrupts say that ISRs should not use delay or do any serial IO, and in general should be kept short.

This is true in general but the usage in this example should be acceptable.

It was pointed out to me that the interrupt will only be triggered once because all the messages are sent from setup.

This is false. Which reason was given to you for this statement?

In any case, moving everything to loop() isn't the right solution. The example runs fine as-is on any ArduinoCore-samd board.

aentinger commented 2 years ago

I transferred the issue over to 107-Arduino-MCP2515 since the problems clearly start with the MCP2515 library.

chalkytoast commented 2 years ago

This is false. Which reason was given to you for this statement?

This was when I tried https://github.com/107-systems/107-Arduino-MCP2515/issues/53#issuecomment-967206898

All seven messages are sent during setup. When the interrupt is raised, messages are not processed by ISR. The interrupt is not raised on subsequent mesasges because the receive buffers are not empty. I'm paraphrasing, but I think that was the gist.

In any case, moving everything to loop() isn't the right solution. The example runs fine as-is on any ArduinoCore-samd board.

Perhaps the 33 BLE has more restrictions on ISRs? If found this in the Arduino forums:

The Arduino Nano 33 BLE uses mbedOS and that really hates you doing anything that takes a long time and other bad things during interrupts. It stops the OS from doing its stuff e.g. switching to other tasks like USB.

The easiest solution is to set a flag in the ISR and then run the code from loop.

Again, thanks for you help.

aentinger commented 2 years ago

Thank you for clarification :+1: Bit on clarification on my end, I do work for Arduino so I know the codebase so I don't see any issue with sending all those messages in the ISR or writing those couple of Serial messages from within the ISR :wink: I guess I'll have to wire up a board myself, but this will not happen soon-ish.