adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
Other
4.06k stars 1.2k forks source link

stm32: canio: duplicate packet transmission #3644

Open jepler opened 3 years ago

jepler commented 3 years ago

After some bus problems, canio sends packets multiple times instead of just once.

Using the "ack" demo from https://github.com/adafruit/Adafruit_Learning_System_Guides/pull/1293/ start the listener and then the sender and see that everything is in a steady state.

Pull out one of the bus wires and then plug it back in after a second or two.

Notice that now the receiver seems to get each packet 3 times and the receiver seems to get each ACK packet 3 times.

jepler commented 3 years ago

This is likely to be specific to stm32f405 implementation, and appears to be a problem on the sending side but that's just a guess. 3 also happens to be the number of transmit buffers.

Kheirlb commented 1 year ago

Also seeing this issue.

Kheirlb commented 6 months ago

I'm thinking about taking a look at this canio issue. I know this issue is old, but can @jepler, @tannewt, @dhalbert, or anyone on the CircuitPython team point me in the right direction? I'm about to switch to Arduion/C just to avoid this bug.

tannewt commented 6 months ago

I'm thinking about taking a look at this canio issue. I know this issue is old, but can @jepler, @tannewt, @dhalbert, or anyone on the CircuitPython team point me in the right direction? I'm about to switch to Arduion/C just to avoid this bug.

We'd love help. You can chat with us in #circuitpython-dev on Discord and we can help with any bumps you hit. Build instructions are here: https://learn.adafruit.com/building-circuitpython The STM canio implementation is done here: https://github.com/adafruit/circuitpython/tree/main/ports/stm/common-hal/canio

Kheirlb commented 6 months ago

To supplement the original report, here is a screenshot of the (recurring?) duplicate transmission issue. Screenshot 2024-03-06 213035

Thanks for the build instructions and pointer tannewt! I'll start poking at the sender side code. Hopefully I can make a little contribution here :)

EDIT: Doing these tests on 8.2.10

  1. I repeated this test with both microcontrollers on the official 8.2.10 from https://circuitpython.org/board/feather_stm32f405_express/. Same result as whatever config I started in.
  2. I also repeated with local checkout of tag: 8.2.10. Same result. Wrote 1129984 bytes to build-feather_stm32f405_express/firmware.uf2 (flashed with the .bin output)
Kheirlb commented 6 months ago

This is likely to be specific to stm32f405 implementation, and appears to be a problem on the sending side but that's just a guess. 3 also happens to be the number of transmit buffers.

@jepler and team, where is this "3" number for transmit buffers defined? I'm looking at some of your later commits for common_hal_canio_can_send: https://github.com/adafruit/circuitpython/commit/e5a0af9216a6d868fad9105a4eb44e0889f54e29 and https://github.com/adafruit/circuitpython/pull/3505/commits/eed3387f4e625335a5d2f6582946ec75d1c1ed10

I think you are maybe talking about the 3 transmit "mailboxes".

image

Kheirlb commented 6 months ago

I think after each tx attempt, we can ensure all "mailboxes" are clear again? Found an implementation here.

This is a substitute for https://github.com/adafruit/circuitpython/commit/e5a0af9216a6d868fad9105a4eb44e0889f54e29

    // Wait until all mailboxes are free again.
    while (HAL_CAN_GetTxMailboxesFreeLevel(&self->handle) != 3) {
        RUN_BACKGROUND_TASKS;
    }

Unfortunately, this seems to make can.send blocking... maybe that is okay? I assume can.send should not be blocking...

anecdata commented 6 months ago

the three transmit 'mailboxes' are in hardware https://www.st.com/resource/en/datasheet/stm32f405rg.pdf section 2.2.29 (p. 37)

Kheirlb commented 6 months ago

Thanks anecdata. That is a good link for someone who is not familar with low-level stuff such as this (like me). I'm not sure if we can use all 3 mailboxes effectively with the send/receive paradigm we currently have in canio. Maybe there is an ordering I haven't figured out yet.

For the other canio ports:

  1. It looks like our atmel-samd port only has "one dedicated TX buffer". I am not sure if there is an equivalent hardware buffer behind the scenes.
  2. I can't find where the tx_queue_len is set for espressif, but that looks software defined anyways. I am not sure if there is an equivalent hardware buffer behind the scenes.
Kheirlb commented 6 months ago

I assume can.send should not be blocking...

Here is a non-blocking version that limits to 1 mailbox. I will probably throw up a PR tomorrow unless someone has a better idea of how to use all 3 mailboxes.

    // Only use 1 out of the 3 transmit mailboxes to avoid confusing ordering.
    // See https://github.com/adafruit/circuitpython/issues/3644 for more details.

    // 3 == nothing in use, send away!
    // 2 or 1... something is already on the bus... sorry.
    if (HAL_CAN_GetTxMailboxesFreeLevel(&self->handle) == 3) {
        HAL_CAN_AddTxMessage(&self->handle, &header, message->data, &mailbox);
    }
Kheirlb commented 6 months ago

Hmm, this might just be a problem with the "ack" demo code.py send/receive scripts sending duplicate packets... and our .receive function getting them in order. Will poke at this more tomorrow.

Kheirlb commented 6 months ago

Here is a slightly different version of https://github.com/adafruit/Adafruit_Learning_System_Guides/blob/main/FeatherCAN_CircuitPython/sender-ack/code.py that reads in all the pending messages. I tried creating a branch/PR on Adafruit_Learning_System_Guides, but ran into permission issues.

# SPDX-FileCopyrightText: 2020 Jeff Epler for Adafruit Industries
#
# SPDX-License-Identifier: MIT

import struct
import time

import board
import canio
import digitalio

# If the CAN transceiver has a standby pin, bring it out of standby mode
if hasattr(board, 'CAN_STANDBY'):
    standby = digitalio.DigitalInOut(board.CAN_STANDBY)
    standby.switch_to_output(False)

# If the CAN transceiver is powered by a boost converter, turn on its supply
if hasattr(board, 'BOOST_ENABLE'):
    boost_enable = digitalio.DigitalInOut(board.BOOST_ENABLE)
    boost_enable.switch_to_output(True)

# Use this line if your board has dedicated CAN pins. (Feather M4 CAN and Feather STM32F405)
can = canio.CAN(rx=board.CAN_RX, tx=board.CAN_TX, baudrate=250_000, auto_restart=True)
# On ESP32S2 most pins can be used for CAN.  Uncomment the following line to use IO5 and IO6
#can = canio.CAN(rx=board.IO6, tx=board.IO5, baudrate=250_000, auto_restart=True)
listener = can.listen(matches=[canio.Match(0x409)], timeout=.1)

old_bus_state = None
count = 0

while True:
    bus_state = can.state
    if bus_state != old_bus_state:
        print(f"Bus state changed to {bus_state}")
        old_bus_state = bus_state

    now_ms = (time.monotonic_ns() // 1_000_000) & 0xffffffff
    print(f"Sending message: count={count} now_ms={now_ms}")

    message = canio.Message(id=0x408, data=struct.pack("<II", count, now_ms))

    # Keep trying to send the same message until confirmed.
    while True:
        received_ack_confirmed = False
        can.send(message)

        # Read in all messages.
        for message_in in listener:
            data = message_in.data
            if len(data) != 4:
                print(f"Unusual message length {len(data)}")
                continue

            ack_count = struct.unpack("<I", data)[0]
            if ack_count == count:
                print(f"Received ACK: {ack_count}")
                received_ack_confirmed = True
                break
            else:
                print(f"Received incorrect ACK: {ack_count} should be {count}")

        if received_ack_confirmed:
            break

        print(f"No ACK received within receive timeout, sending {count} again")

    time.sleep(.5)
    count += 1
Kheirlb commented 6 months ago

I tried creating a branch/PR on Adafruit_Learning_System_Guides, but ran into permission issues.

Read this. I created a fork and opened a PR to update the example. See https://github.com/adafruit/Adafruit_Learning_System_Guides/pull/2752