How to get out of a sticky situation

Domen2242 commented 3 years ago

I have a SDO block upload issue I've ran into a couple of times while testing.

The SDO server(based on this stack) is in a BLK_UPLOAD state from a previous unfinished transfer SDO block upload (ex. SDO client got reset during transfer). SDO client (based on canfestival) now tries to initiate a new block upload, which gets aborted by the SDO server because it doesn't match expected frame format for an existing block upload transfer.

} else if (srv->Blk.State == BLK_UPLOAD) {
    if (cmd == 0xA1) {
        result = COSdoEndUploadBlock(srv);
    } else if ((cmd & 0xE3) == 0xA2) {
        result = COSdoAckUploadBlock(srv);
    } else {
        COSdoAbort(srv, CO_SDO_ERR_CMD); // this is what gets executed
    }
    return (result);
}

Every subsequent attempt by the SDO client to start a new block upload gets aborted, because the SDO server remains in a BLK_UPLOAD state.

How is such a situation supposed to get resolved? The CiA 301 standard is very clear on what should happen in such a situation during a segmented SDO upload; the new SDO upload initiate frame would terminate the existing transfer start a new one. However, I cannot find any such clause regarding SDO block upload in the standard.

from looking at canfestival's source code for SDO server, it seems to terminate the existing transfer when receiving a new SDO block upload initiate frame, as well as aborting the new request (but a subsequent request should then work).

How do you think such a situation is supposed to get resolved? I can't find a clear answer in the CiA standard.

michael-hillmann commented 3 years ago

Sounds tricky; I think we can abort a running transfer (in any state) when a start transfer is received. If this start of transfer is accepted or not is difficult. Thank you for your diagnostic of the canfestival's server. Do you have information about the cooperation with devices in the field and how they expect this sequence?

michael-hillmann commented 2 years ago

Sorry for the delay. I did some research and asked customers what other devices would expect in such a situation. It seems to be an "unwritten rule", that the behavior should follow:

Server Start
Client Start
Block Transfer Init => ACK
Block Transfer => ACK
Client Shutdown
Client Start
Block Transfer Init => Server did "Block Transfer Abort" with the abort code "Command specifier not valid - 0x05040001"
Block Transfer Init => ACK
Block Transfers => ACK
Block Transfer End => ACK

I think this is the approach we need to implement.

michael-hillmann commented 2 years ago

Maybe you can check the improvement within the develop branch. In the tests, this change may fix your situation.

Domen2242 commented 2 years ago

Thank you for your research and fix! Great, this behavior seems to match my findings from canfestival source code. The change in develop branch works as expected on my setup.

embedded-office / canopen-stack

How to get out of a sticky situation #62