KevinOConnor / can2040

Software CAN bus implementation for rp2040 micro-controllers
GNU General Public License v3.0
686 stars 68 forks source link

Added a max retry counter. #44

Closed linted closed 1 year ago

linted commented 1 year ago

This is used to limit the number of attempts to send a single can msg. It can be used to detect if there are errors on the canbus, such as NACK on every frame. This is implemented by adding an additional parameter to can2040_start which specifies the max number of retries before returning an error and continuing to the next message in the tx queue. An error status is returned to the can2040_cb function. Since this is the first additional error type, it uses 1 anded with CAN2040_NOTIFY_ERROR to signify the difference of error type. This can be documented better on request. In order to disreguard this functionality, -1 can be passed as retry_max to can2040_start.

linted commented 1 year ago

This is an attempt to address #40

KevinOConnor commented 1 year ago

Thanks. As high-level feedback, this seems like a debugging capability and I'm not sure it is a good fit for the upstream can2040 repo. The code seems fine for troubleshooting possible failures, but I fear it may be a little misleading for regular users. For example, this code probably wouldn't work well for users that want to transmit a message with low tx latency (for timing purposes). (As, for example, the retry count may get incremented if there are unrelated rx errors, and the retry count may not get incremented if the line is yielded to a higher priority transmitter.) Beyond debugging notifications, I'm not sure disabling retries has much value - if a message can't be sent, then almost assuredly no messages can be sent. (The canbus spec requires all nodes to acknowledge all messages, so if the node doesn't get an ack, it indicates it is unable to send any messages to any nodes.)

Cheers, -Kevin

linted commented 1 year ago

Thank you for the feedback! I would love to work so that this is less confusing for end users, but the main point of this pull request is to address an issue that I have come across in my application. There may come a point where the node, which my rp2040 is communicating with over can, crashes. Currently this library has no way of handling that. Instead it will continuously resend the current can message and never report and issue to the higher level code.

This PR attempts to address this issue by putting an upper limit on the number of non-acknowledged frames which are sent before an error is returned to the calling process.

As someone who is not nearly as intimately familiar with the code base as you are, is there a better way of going about this then how I have implemented it?

KevinOConnor commented 1 year ago

Thanks.

If you've found a solution for your application then I think that is great. Don't feel you need to work on it further. My goal with this can2040 library is to provide a simple interface for reading and writing basic canbus messages. I think it is fine for others to customize the code for their particular applications. So, please don't consider my comments as criticism of your work. I'm just not sure that your particular use case is very common.

If you are considering alternative mechanisms I can provide some very high-level feedback.

If I understand your comments, it seems you are limiting retries in an attempt to detect a failing node. For what it is worth, I'm not sure that is a scalable solution. If the canbus were to contain three or more nodes (1 transmitter, 1 intended receiver, 1 or more unrelated nodes) then the transmitter would not be able to use canbus errors to detect a failing node. The canbus specifications require that all nodes on the canbus generate an ack for all valid messages on the canbus, even if those messages are not intended for the given node. So, the moment an additional node is added to the canbus, it'll ack all the messages that were intended for the failing node and the transmitter will have no way of knowing if the message was received by the intended receiver. A more scalable solution is likely to update the application code to use sequence numbers, acknowledgement messages, and timeouts (or perhaps to send some type of periodic status request message and a timeout on the status response). The low-level canbus protocol itself really isn't capable of detecting outages as you've described.

If you're looking to get information on canbus errors (for more transparent error reporting in general) then perhaps an alternative solution would be to add a new void can2040_get_statistics(struct can2040 *cd, struct can2040_stats *stats) type of function. Thus enabling a caller to periodically query error counters and take action if they indicate an issue. I'll try to look at adding this myself as I think it could be generally useful.

Finally, if you're specifically looking to limit transmit retry attempts, an alternative approach may be to count the number of times pio_tx_send() is invoked. That's likely a more accurate way of tracking duplicate transmit attempts (instead of estimating retries via can2040 error handling code paths).

As above though, if you've found a solution for your application, I don't think you need to find an additional solution.

Cheers, -Kevin

linted commented 1 year ago

Thank you for the feed back.

I understand your point about multiple nodes on the bus, and how any one of them can ack a packet. That's behavior that inherent and desirable for the can traffic. I personally find undesirable the continuous, infinite retransmission that happens when the rp2040 is the only active node, such as when there is an error state or before other nodes have started up. This is especially true when trying to reduce power consumption. In a perfect ideal world, I think you would be correct, as this would be a redundant feature which would serve no purpose.

Please feel free to close this PR if this feature doesn't make sense to have in the main branch of your repo.

github-actions[bot] commented 1 year ago

Hello,

It looks like there hasn't been any recent updates on this github ticket. We prefer to only list tickets as "open" if they are actively being worked on. Feel free to provide an update on this ticket. Otherwise the ticket will be automatically closed in a few days.

Best regards,

~ Your friendly GitIssueBot

PS: I'm just an automated script, not a human being.