christiansandberg / canopen

CANopen for Python
http://canopen.readthedocs.io/
MIT License
431 stars 195 forks source link

Inconsistent data truncation of unsupported data types in SDO upload #436

Open sveinse opened 3 months ago

sveinse commented 3 months ago

I'm having problems reading SDO objects that are using UNSIGNED48. It used to work before PR #395 by @ljungholm - but it turns out to be coincidental.

https://github.com/christiansandberg/canopen/blob/24df6e846a4e71508fd82c378d1385d65f000c94/canopen/sdo/client.py#L123-L133

I believe the root cause is that canopen doesn't support the UNSIGNED48 type (which I'll contribute in another PR). The unfortunate behavior is that due to the unknown type, the data gets truncated to 1 byte. It didn't do that before #395 .

>> upload((8192, 2), {})
<-- 602     :  40 00 20 02 00 00 00 00
--> 582     :  41 00 20 02 08 00 00 00
  << read_response() = b'A\x00 \x02\x08\x00\x00\x00'
<-- 602     :  60 00 00 00 00 00 00 00
--> 582     :  00 b2 01 20 02 91 12 00
  << read_response() = b'\x00\xb2\x01 \x02\x91\x12\x00'
<-- 602     :  70 00 00 00 00 00 00 00
--> 582     :  1d 00 00 00 00 00 00 00
  << read_response() = b'\x1d\x00\x00\x00\x00\x00\x00\x00'
     data=b'\xb2\x01 \x02\x91\x12\x00\x00'
     response_size = 8
     var.datatype = 25   # UNSIGNED48 (unsupported)
     var_size = 1   # Because len(var) on unknown datatypes is 8
     data = b'\xb2'   # Truncated due to ^
<< upload() = b'\xb2'

Observations:

Would you agree that upload() should return the the full data for both of the first use cases? Is the fix that ODVariable.__len__() on unknown datatypes should return 64?

sveinse commented 2 months ago

Would you agree that upload() should return the the full data for both of the first use cases? Is the fix that ODVariable.__len__() on unknown datatypes should return 64?

PR #440 implements four test cases (test_unknown_*()) that address this issue. The test fails in the PR because the contains longer data and it gets truncated. IMHO canopen shouldn't truncate any messages it encounters for unknown datatypes. Especially when it is passed transparently if there is no OD entry for it.

Why do we have the logic of if var.data_type not in objectdictionary.DATA_TYPES: test and then proceed with truncating the message?

christiansandberg commented 2 months ago

Maybe use 64 bits instead of 8 is the easiest solution. Hopefully it shouldn't happen too often after 48 bits support is added.

sveinse commented 2 months ago

It depends on not getting a future datatype which is larger than 64 bits. If we chose to cap it at 64, I would suggest that we warn about it to the logger that the data has been capped with an unknown datatype. (And the test in PR #440 must be altered to make it pass.)

acolomb commented 2 months ago

I'm not really fond of replacing the fall-back 8 bits value by 64 bits.

It makes sense for any type to have at least 8 bits, since one octet is the smallest unit supported by SDO (disregarding PDO mapping with finer granularity here). Using any other fall-back value is arbitrary and will break unexpectedly when it does.

We should rather fix the min(OD, SDO) length assumption to only get applied when the OD size is actually meaningful and not itself a fall-back value.

sveinse commented 2 months ago

We should rather fix the min(OD, SDO) length assumption to only get applied when the OD size is actually meaningful and not itself a fall-back value.

I think this makes sense. Since the data type is of unknown type, no assumptions about the data contents can be made. So it should be passed transparently without truncation. -- Especially since it would pass transparently if the object have no OD entry.

christiansandberg commented 2 months ago

I agree, better to check if size is known or not.

sveinse commented 2 months ago

I attempted

var = self.od.get_variable(index, subindex)
if var is not None:
    if var.data_type not in objectdictionary.DATA_TYPES:
        # Use max to determine enough space for the unknown data type
       var_size = max(len(var) // 8, response_size)
        if response_size is None or var_size < response_size:
            # This wil never happen...
            data = data[0:var_size]
return data

But it turned out a bit non-sensical. So I'm kind of back at: Why do we even truncate? Why do we want to reduce the data for something we don't know what is?

acolomb commented 2 months ago

Just a quick note, the truncating was introduced to avoid an exception from the struct methods. They are quite strict about the buffer length, but could be tricked otherwise by slicing.

Will need to take another good look at the alternatives and code flow...

sveinse commented 2 months ago

I would assume that known data types can or should be truncated in case of excessive data. Here we're talking about unknown data types with no reference to validate its proper size.

acolomb commented 2 months ago

I'm really still uncertain what the actual problem here is. SdoClient.upload() is actually independent from the ODVariable. It used to (before #395) only consult the OD when the uploaded data was of unspecified size. #395 changed that to apply even when an upload size was indicated, which I think is not always correct. For broken devices that send e.g. an UNSIGNED8 as UNSIGNED32 with four data bytes, it does make sense to truncate the data instead of failing during unpack(). But for any type not based on struct.Struct, truncating is usually harmful.

The real culprit here is IMHO the check var.data_type not in objectdictionary.DATA_TYPES. That applies for any unknown OD entry type as well, but should actually be limited to types where the length is known in advance. So better to check for var.data_type in var.STRUCT_TYPES here, assuming that all known-length data types will be unpacked using structs. Even better to encapsulate this into a new property ODVariable.length_known.

And maybe this whole check should be moved to SdoVariable.get_data(), where we are certain that an associated ODVariable is actually available? For the "OD-independent" (as stated by the docstring) .upload() method, consulting the OD is a bit inconsistent.

sveinse commented 2 months ago

I think we have two independent factors or conditions for any object ready by SDO upload:

(The PR #440 contains unit tests for both of these cases)

Known Data type Have OD entry Result data
Yes Yes As-is
Yes No As-is
No No As-is
No Yes Data truncated

So I think the summary of the problem statement is that the current implementation truncates the data in the latter case.

acolomb commented 2 months ago

I think the problem is to more clearly define what is a "known" data type. The check against objectdictionary.DATA_TYPES actually just matches (VISIBLE_STRING, OCTET_STRING, UNICODE_STRING, DOMAIN). Those are the defined ones which are not parsed by a struct, and the check also matches undefined data type entries. We should invert that and apply the truncation only for known data types where a struct gets used for parsing, I think.

Also, your second case is impossible, as the data type is defined in the OD entry, thus without one we cannot know.

sveinse commented 2 months ago

I think the problem is to more clearly define what is a "known" data type. The check against objectdictionary.DATA_TYPES actually just matches (VISIBLE_STRING, OCTET_STRING, UNICODE_STRING, DOMAIN). Those are the defined ones which are not parsed by a struct, and the check also matches undefined data type entries. We should invert that and apply the truncation only for known data types where a struct gets used for parsing, I think.

Yes, I think that is a good idea. We must except array-types from this truncation, being (VISIBLE_STRING, OCTET_STRING, UNICODE_STRING, DOMAIN).

Also, your second case is impossible, as the data type is defined in the OD entry, thus without one we cannot know.

Yes, you're right. Without OD one cannot know the data type. I missed the fact. It is possible thou to read an object with SDO that is missing from the OD and thus its data type is missing. Its data should definitely not be truncated.

acolomb commented 2 months ago

Right, so to summarize: We only truncate if the OD contains an entry AND its data type is contained in STRUCT_TYPES. That automatically leaves array types (DOMAIN etc.) untouched, and will not explode for yet undefined type constants.

On to the other questions:

sveinse commented 2 months ago

It must look up the OD to get the data type, so in order for SdoClient.upload() to do any length validation, it must use the OD.

IMHO SdoClient.upload() is just a raw producer of bytes using an SDO upload request. The deserialization step into native python objects happens later. So I think I lean towards not doing any validation in SdoClient.upload().

Today the deserialization happens in ODVariable.decode_raw() with its call from the Variable.raw property. This is where the stream from the SDO upload should be validated, I think.

EDIT I vote for removing any validation from SdoClient.upload() and add it to ODVariable.decode_raw(). The validation should only be made on known data types (that is types this library can decode) with the exception of the known array formats.

PPS! Should we validate, e.g. by warning or error, or should we truncate (and warn)?

acolomb commented 2 months ago

The only reason for doing validation in .upload() is that we know the indicated data set size there. The check prior to #395 actually only kicked in when that size was None, i.e. the server didn't specify how many data bytes should be interpreted.

When moving the validation to decode_raw() completely, we lose that bit of information. But frankly, I'm not sure we really need it at all. Trying to summarize it into a table again (may need to scroll horizontally):

No. Type in OD SDO Size Indication
(response_size)
SDO Payload
(len(data))
Action in
decode_raw()
Use Case
1 in var.STRUCT_TYPES unspecified == len(var) OK, unpack normal
2 in var.STRUCT_TYPES unspecified < len(var) pad zeros right, unpack partial (compressed?) response
3 in var.STRUCT_TYPES unspecified > len(var) truncate right, unpack normal, depends on OD
4 in var.STRUCT_TYPES == len(var) (don’t care) OK, unpack normal
5 in var.STRUCT_TYPES < len(var) (don’t care) pad zeros right, unpack partial (compressed?) response
6 in var.STRUCT_TYPES > len(var) (don’t care) truncate right, unpack quirk, always 4-byte for small vars
7 in object_dictionary.DATA_TYPES unspecified use as length bytes(data) normal, streaming until complete
8 in object_dictionary.DATA_TYPES specified (don’t care) bytes(data[:response_size]) normal, discarding extra (unused) bytes
9 neither STRUCT nor DATA unspecified use as length bytes(data) unknown data type
10 neither STRUCT nor DATA specified (don’t care) bytes(data[:response_size]) unknown data type, discarding extra bytes
11 no OD entry unspecified use as length bytes(data) unknown data type
12 no OD entry specified (don’t care) bytes(data[:response_size]) unknown data type, discarding extra bytes

Notes:

Does that all seem correct? Would it be okay for you if we used this table as basis for the implementation and tested each case?

sveinse commented 2 months ago

This is a really good overview. There are two independent usages of length involved here

acolomb commented 2 months ago

I'd really like to get this one fixed soon, but right now we only have ideas and not a complete and well-tested implementation draft of these last suggestions. Therefore I think we should focus on getting a release out the door to avoid problems like #455 and come back to this afterwards.

OK for you @sveinse?

sveinse commented 2 months ago

@acolomb Yes, let's not delay the release because of this. Current behavior is what it is and we're not introducing any regressions by not fixing this issue.

sveinse commented 2 months ago

@friederschueler not sure I'd label this one as an enhancement. Its describing unexpected behavior, so it's a bug.