embedded-office / canopen-stack

Free CANopen Stack for Embedded Systems
https://canopen-stack.org
Apache License 2.0
387 stars 117 forks source link

How to get out of a sticky situation #62

Closed Domen2242 closed 2 years ago

Domen2242 commented 3 years ago

I have a SDO block upload issue I've ran into a couple of times while testing.

The SDO server(based on this stack) is in a BLK_UPLOAD state from a previous unfinished transfer SDO block upload (ex. SDO client got reset during transfer). SDO client (based on canfestival) now tries to initiate a new block upload, which gets aborted by the SDO server because it doesn't match expected frame format for an existing block upload transfer.

} else if (srv->Blk.State == BLK_UPLOAD) {
    if (cmd == 0xA1) {
        result = COSdoEndUploadBlock(srv);
    } else if ((cmd & 0xE3) == 0xA2) {
        result = COSdoAckUploadBlock(srv);
    } else {
        COSdoAbort(srv, CO_SDO_ERR_CMD); // this is what gets executed
    }
    return (result);
}

Every subsequent attempt by the SDO client to start a new block upload gets aborted, because the SDO server remains in a BLK_UPLOAD state.

How is such a situation supposed to get resolved? The CiA 301 standard is very clear on what should happen in such a situation during a segmented SDO upload; the new SDO upload initiate frame would terminate the existing transfer start a new one. However, I cannot find any such clause regarding SDO block upload in the standard.

from looking at canfestival's source code for SDO server, it seems to terminate the existing transfer when receiving a new SDO block upload initiate frame, as well as aborting the new request (but a subsequent request should then work).

How do you think such a situation is supposed to get resolved? I can't find a clear answer in the CiA standard.

michael-hillmann commented 3 years ago

Sounds tricky; I think we can abort a running transfer (in any state) when a start transfer is received. If this start of transfer is accepted or not is difficult. Thank you for your diagnostic of the canfestival's server. Do you have information about the cooperation with devices in the field and how they expect this sequence?

michael-hillmann commented 2 years ago

Sorry for the delay. I did some research and asked customers what other devices would expect in such a situation. It seems to be an "unwritten rule", that the behavior should follow:

I think this is the approach we need to implement.

michael-hillmann commented 2 years ago

Maybe you can check the improvement within the develop branch. In the tests, this change may fix your situation.

Domen2242 commented 2 years ago

Thank you for your research and fix! Great, this behavior seems to match my findings from canfestival source code. The change in develop branch works as expected on my setup.