Inconsistent data truncation of unsupported data types in SDO upload

sveinse commented 3 months ago

I'm having problems reading SDO objects that are using UNSIGNED48. It used to work before PR #395 by @ljungholm - but it turns out to be coincidental.

https://github.com/christiansandberg/canopen/blob/24df6e846a4e71508fd82c378d1385d65f000c94/canopen/sdo/client.py#L123-L133

I believe the root cause is that canopen doesn't support the UNSIGNED48 type (which I'll contribute in another PR). The unfortunate behavior is that due to the unknown type, the data gets truncated to 1 byte. It didn't do that before #395 .

>> upload((8192, 2), {})
<-- 602     :  40 00 20 02 00 00 00 00
--> 582     :  41 00 20 02 08 00 00 00
  << read_response() = b'A\x00 \x02\x08\x00\x00\x00'
<-- 602     :  60 00 00 00 00 00 00 00
--> 582     :  00 b2 01 20 02 91 12 00
  << read_response() = b'\x00\xb2\x01 \x02\x91\x12\x00'
<-- 602     :  70 00 00 00 00 00 00 00
--> 582     :  1d 00 00 00 00 00 00 00
  << read_response() = b'\x1d\x00\x00\x00\x00\x00\x00\x00'
     data=b'\xb2\x01 \x02\x91\x12\x00\x00'
     response_size = 8
     var.datatype = 25   # UNSIGNED48 (unsupported)
     var_size = 1   # Because len(var) on unknown datatypes is 8
     data = b'\xb2'   # Truncated due to ^
<< upload() = b'\xb2'

Observations:

Reading SDO objects without OD record will preserve data as-is
Reading SDO objects with OD, but of unsupported type will return data of size 1
ODVariable.__len__() return 8 on unknown types, while it reports number of bits in the other types. This is why it gets truncated to one byte.

Would you agree that upload() should return the the full data for both of the first use cases? Is the fix that ODVariable.__len__() on unknown datatypes should return 64?

sveinse commented 2 months ago

Would you agree that upload() should return the the full data for both of the first use cases? Is the fix that ODVariable.__len__() on unknown datatypes should return 64?

PR #440 implements four test cases (test_unknown_*()) that address this issue. The test fails in the PR because the contains longer data and it gets truncated. IMHO canopen shouldn't truncate any messages it encounters for unknown datatypes. Especially when it is passed transparently if there is no OD entry for it.

Why do we have the logic of if var.data_type not in objectdictionary.DATA_TYPES: test and then proceed with truncating the message?

christiansandberg commented 2 months ago

Maybe use 64 bits instead of 8 is the easiest solution. Hopefully it shouldn't happen too often after 48 bits support is added.

sveinse commented 2 months ago

It depends on not getting a future datatype which is larger than 64 bits. If we chose to cap it at 64, I would suggest that we warn about it to the logger that the data has been capped with an unknown datatype. (And the test in PR #440 must be altered to make it pass.)

acolomb commented 2 months ago

I'm not really fond of replacing the fall-back 8 bits value by 64 bits.

It makes sense for any type to have at least 8 bits, since one octet is the smallest unit supported by SDO (disregarding PDO mapping with finer granularity here). Using any other fall-back value is arbitrary and will break unexpectedly when it does.

We should rather fix the min(OD, SDO) length assumption to only get applied when the OD size is actually meaningful and not itself a fall-back value.

sveinse commented 2 months ago

We should rather fix the min(OD, SDO) length assumption to only get applied when the OD size is actually meaningful and not itself a fall-back value.

I think this makes sense. Since the data type is of unknown type, no assumptions about the data contents can be made. So it should be passed transparently without truncation. -- Especially since it would pass transparently if the object have no OD entry.

christiansandberg commented 2 months ago

I agree, better to check if size is known or not.

sveinse commented 2 months ago

I attempted

var = self.od.get_variable(index, subindex)
if var is not None:
    if var.data_type not in objectdictionary.DATA_TYPES:
        # Use max to determine enough space for the unknown data type
       var_size = max(len(var) // 8, response_size)
        if response_size is None or var_size < response_size:
            # This wil never happen...
            data = data[0:var_size]
return data

But it turned out a bit non-sensical. So I'm kind of back at: Why do we even truncate? Why do we want to reduce the data for something we don't know what is?

acolomb commented 2 months ago

Just a quick note, the truncating was introduced to avoid an exception from the struct methods. They are quite strict about the buffer length, but could be tricked otherwise by slicing.

Will need to take another good look at the alternatives and code flow...

sveinse commented 2 months ago

I would assume that known data types can or should be truncated in case of excessive data. Here we're talking about unknown data types with no reference to validate its proper size.

acolomb commented 2 months ago

I'm really still uncertain what the actual problem here is. SdoClient.upload() is actually independent from the ODVariable. It used to (before #395) only consult the OD when the uploaded data was of unspecified size. #395 changed that to apply even when an upload size was indicated, which I think is not always correct. For broken devices that send e.g. an UNSIGNED8 as UNSIGNED32 with four data bytes, it does make sense to truncate the data instead of failing during unpack(). But for any type not based on struct.Struct, truncating is usually harmful.

The real culprit here is IMHO the check var.data_type not in objectdictionary.DATA_TYPES. That applies for any unknown OD entry type as well, but should actually be limited to types where the length is known in advance. So better to check for var.data_type in var.STRUCT_TYPES here, assuming that all known-length data types will be unpacked using structs. Even better to encapsulate this into a new property ODVariable.length_known.

And maybe this whole check should be moved to SdoVariable.get_data(), where we are certain that an associated ODVariable is actually available? For the "OD-independent" (as stated by the docstring) .upload() method, consulting the OD is a bit inconsistent.

sveinse commented 2 months ago

I think we have two independent factors or conditions for any object ready by SDO upload:

Is the var using a known data type
Is there a OD entry for the object

(The PR #440 contains unit tests for both of these cases)

Known Data type	Have OD entry	Result data
Yes	Yes	As-is
Yes	No	As-is
No	No	As-is
No	Yes	Data truncated

So I think the summary of the problem statement is that the current implementation truncates the data in the latter case.

acolomb commented 2 months ago

I think the problem is to more clearly define what is a "known" data type. The check against objectdictionary.DATA_TYPES actually just matches (VISIBLE_STRING, OCTET_STRING, UNICODE_STRING, DOMAIN). Those are the defined ones which are not parsed by a struct, and the check also matches undefined data type entries. We should invert that and apply the truncation only for known data types where a struct gets used for parsing, I think.

Also, your second case is impossible, as the data type is defined in the OD entry, thus without one we cannot know.

sveinse commented 2 months ago

I think the problem is to more clearly define what is a "known" data type. The check against objectdictionary.DATA_TYPES actually just matches (VISIBLE_STRING, OCTET_STRING, UNICODE_STRING, DOMAIN). Those are the defined ones which are not parsed by a struct, and the check also matches undefined data type entries. We should invert that and apply the truncation only for known data types where a struct gets used for parsing, I think.

Yes, I think that is a good idea. We must except array-types from this truncation, being (VISIBLE_STRING, OCTET_STRING, UNICODE_STRING, DOMAIN).

Also, your second case is impossible, as the data type is defined in the OD entry, thus without one we cannot know.

Yes, you're right. Without OD one cannot know the data type. I missed the fact. It is possible thou to read an object with SDO that is missing from the OD and thus its data type is missing. Its data should definitely not be truncated.

acolomb commented 2 months ago

Right, so to summarize: We only truncate if the OD contains an entry AND its data type is contained in STRUCT_TYPES. That automatically leaves array types (DOMAIN etc.) untouched, and will not explode for yet undefined type constants.

On to the other questions:

Should the truncation happen always in SdoClient.upload(), or do we move it to SdoVariable.get_data(), which will only ever be called with a valid ODVariable associated? IMHO if upload() states that it works "without an Object Dictionary", then looking up anything from the OD in that function will be a surprise.
I suggested to encapsulate the check into some ODVariable.length_known property for better readability and separation of concerns.

sveinse commented 2 months ago

It must look up the OD to get the data type, so in order for SdoClient.upload() to do any length validation, it must use the OD.

IMHO SdoClient.upload() is just a raw producer of bytes using an SDO upload request. The deserialization step into native python objects happens later. So I think I lean towards not doing any validation in SdoClient.upload().

Today the deserialization happens in ODVariable.decode_raw() with its call from the Variable.raw property. This is where the stream from the SDO upload should be validated, I think.

EDIT I vote for removing any validation from SdoClient.upload() and add it to ODVariable.decode_raw(). The validation should only be made on known data types (that is types this library can decode) with the exception of the known array formats.

PPS! Should we validate, e.g. by warning or error, or should we truncate (and warn)?

acolomb commented 2 months ago

The only reason for doing validation in .upload() is that we know the indicated data set size there. The check prior to #395 actually only kicked in when that size was None, i.e. the server didn't specify how many data bytes should be interpreted.

When moving the validation to decode_raw() completely, we lose that bit of information. But frankly, I'm not sure we really need it at all. Trying to summarize it into a table again (may need to scroll horizontally):

No.	Type in OD	SDO Size Indication (`response_size`)	SDO Payload (`len(data)`)	Action in `decode_raw()`	Use Case
1	`in var.STRUCT_TYPES`	unspecified	`== len(var)`	OK, unpack	normal
2	`in var.STRUCT_TYPES`	unspecified	`< len(var)`	pad zeros right, unpack	partial (compressed?) response
3	`in var.STRUCT_TYPES`	unspecified	`> len(var)`	truncate right, unpack	normal, depends on OD
4	`in var.STRUCT_TYPES`	`== len(var)`	(don’t care)	OK, unpack	normal
5	`in var.STRUCT_TYPES`	`< len(var)`	(don’t care)	pad zeros right, unpack	partial (compressed?) response
6	`in var.STRUCT_TYPES`	`> len(var)`	(don’t care)	truncate right, unpack	quirk, always 4-byte for small vars
7	`in object_dictionary.DATA_TYPES`	unspecified	use as length	`bytes(data)`	normal, streaming until complete
8	`in object_dictionary.DATA_TYPES`	specified	(don’t care)	`bytes(data[:response_size])`	normal, discarding extra (unused) bytes
9	neither STRUCT nor DATA	unspecified	use as length	`bytes(data)`	unknown data type
10	neither STRUCT nor DATA	specified	(don’t care)	`bytes(data[:response_size])`	unknown data type, discarding extra bytes
11	no OD entry	unspecified	use as length	`bytes(data)`	unknown data type
12	no OD entry	specified	(don’t care)	`bytes(data[:response_size])`	unknown data type, discarding extra bytes

Notes:

(1) to (3) and (4) to (6) are equivalent here, which is why we don't really need to distinguish the response_size being specified while decoding.
(2) and (5) are mostly useful in PDO context, where a numeric variable can be mapped partially (e.g. UNSIGNED32 object with small values mapping only 16 bits to PDO). We might want to differentiate this between PDO and SDO then?
(6) is essentially what #395 tried to fix. Again, by handling the quirky responses in this way we make it equivalent to (3).
(7) to (12) all have in common that they are not in var.STRUCT_TYPES (also if var is None), thus this is the only check we would really need. not in objectdictionary.DATA_TYPES (as currently coded) however doesn't catch all cases, especially unknown bit widths, that's why new tests fail without #440.

Does that all seem correct? Would it be okay for you if we used this table as basis for the implementation and tested each case?

sveinse commented 2 months ago

This is a really good overview. There are two independent usages of length involved here

Declared SDO upload size (response_size) vs. actual data (len(data)). If these don't match, I am inclined to consider this a protocol violation. (I speculate if the SDO abort code 0x05040002 is intended for such size mismatches in block mode).
Actual data size (len(data)) vs. expected OD size (len(var)). If there is too little data to decode the format, then, well, there is not enough information and this should be an exception. If there is excessive data, it can be truncated. But only if the data type is known. For anything else, the data should be passed transparently through.

acolomb commented 2 months ago

I'd really like to get this one fixed soon, but right now we only have ideas and not a complete and well-tested implementation draft of these last suggestions. Therefore I think we should focus on getting a release out the door to avoid problems like #455 and come back to this afterwards.

OK for you @sveinse?

sveinse commented 2 months ago

@acolomb Yes, let's not delay the release because of this. Current behavior is what it is and we're not introducing any regressions by not fixing this issue.

sveinse commented 2 months ago

@friederschueler not sure I'd label this one as an enhancement. Its describing unexpected behavior, so it's a bug.

christiansandberg / canopen

Inconsistent data truncation of unsupported data types in SDO upload #436