IntersectMBO / plutus

The Plutus language implementation and tools
Apache License 2.0
1.58k stars 482 forks source link

"Wrong" Flat serialization of CBOR encoded Data #5852

Closed nau closed 8 months ago

nau commented 8 months ago

Summary

According to the Flat spec, it should store long byte strings as a list of chunks of 255 bytes pluts the rest. I've found a counterexample, where uplc tool flat-encodes a long CBOR encoded Data constant, and encodes chunks of length 0xF7 + 0x69 instead of 0xFF + 0x61. The serialization/deserialization works fine, but this encoding is weird and inconsistent with the spec.

// 1st bytestring is 86 bytes long, should be split at 64 bytes and 22 bytes
// 2nd bytestring is 54 bytes long, should not be split
// 3rd bytestring is 89 bytes, 64 + 25
// ByteString parts                                                                                                  | 0x40 + 0x16 (22) | 22 bytes chunk  ByteString chunks list ends | I(BigInt("1")) | 2nd Bytestring                                                                          end ByteString   | Arr start Bytes start  len | 3rd Bytestring, 64 bytes chunk                                                                                                | len | 25 bytes chunk               end of Bytestring | break                                                            | next Bytestring    |   |continuation
//                    Flat Bytestring split, but why the hell F7 instead ofhere, after F7 bytes there is 69 left. Instead it should be FF and then 61 left
// 7501813E97DA01FFBA F7 29FFDB69377A8062B7B2544AC53E7F7FB20A7F80005C2A287F003E82012A01FFFF01000000A763B2C6370100809E 56                99B83C0D86FFFFE3FF7F80FFBE55E27E8F4614F9709D FF 0158 36        61017F00BA0137896E15C2FFC5700100D400CFA554FF00F4008000BC807F7FF65BFF7F7A0100C8FF007F9780FFF48013657FCB00FFBC 9F        5F           5840 8172012030B5DF7F7F007F7FC3FFB9D3013636017F127F80807F807F8F9CBC807F55EDFF6DFF995BB1EE7F6D807F5CE292C90001005E650001BA66808001FF80 5819 8084970101FFBC00BEA182008022800001C1FF6E9D012BEF7F FF     D905229FD9051780D9050280FFA11A7FFFFFFF1903E8FF5F 69 5840 0101AAACDB2F            ] 8021E15080FF01189607
// Scalus encodes this correctly, 0xFF bytes chunk and then 0x61 left in the end
// 7501813E97DA01FFBA FF 29FFDB69377A8062B7B2544AC53E7F7FB20A7F80005C2A287F003E82012A01FFFF01000000A763B2C6370100809E 56                99B83C0D86FFFFE3FF7F80FFBE55E27E8F4614F9709D FF 0158 36        61017F00BA0137896E15C2FFC5700100D400CFA554FF00F4008000BC807F7FF65BFF7F7A0100C8FF007F9780FFF48013657FCB00FFBC 9F        5F           5840 8172012030B5DF7F7F007F7FC3FFB9D3013636017F127F80807F807F8F9CBC807F55EDFF6DFF995BB1EE7F6D807F5CE292C90001005E650001BA66808001FF80 5819 8084970101FFBC00BEA182008022800001C1FF6E9D012BEF7F FF     D905229FD9051780D9050280FFA11A7FFFFFFF1903E8FF5F    5840 0101AAACDB2F          61] 8021E15080FF01189607

Steps to reproduce the behavior

Link to Scalus bug reproduction Here is a hex bytestring of Flat encoded UPLC program that has this issue:



Actual Result

Here uplc tool produces flat encoded program that breaks a long bytestring in chunks of length 0xF7 and 0x69 instead of 0xFF and 0x61

Expected Result

As described in Summary, flat encoding should break long bytestrings in chunks of 255 bytes, not 247 or whatever.

Describe the approach you would take to fix this

No response

System info

MacOS, but it doesn't matter

nau commented 8 months ago

I tried this with plutus v1.15.1.0 and v1.23.0.0 with the same results

kwxm commented 8 months ago

There does seem to be something strange going on there. We'll look into it and get back to you. Thanks for pointing this out.

kwxm commented 8 months ago

@nau You were right about this. The specification says

we recommend (but do not demand) [that bytestrings are encoded using the canonical format]

so maybe we can try to claim that it wasn't really a bug. However it makes things less confusing if we use the canonical format, so I've added a fix in this PR. The problem was that we encode Data by converting to CBOR and then encode that using flat, but the CBOR serialisation function returns a lazy bytestring which is already divided into chunks so flat just serialises the chunks individually. This is easily fixed by converting the CBOR bytestring to a strict one before converting it to flat.

Thanks for reporting this!

nau commented 8 months ago

Thank you, Kenneth!