dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.07k stars 4.69k forks source link

CBOR does not handle canonical NaN correctly #92080

Closed vcsjones closed 12 months ago

vcsjones commented 1 year ago

Description

Since System.Formats.Cbor adheres to RFC 7049, I don't believe NaN is handled correctly.

The RFC states for canonical CBOR:

Also, there are many representations for NaN. If NaN is an allowed value, it must always be represented as 0xf97e00.

CborWriter will encode it as 0xF9FE00.

Similarly, CborReader will not throw for any non-canonically encoded NaN.

Reproduction Steps

To reproduce the CborWriter issue:

using System;
using System.Formats.Cbor;

CborWriter writer = new(CborConformanceMode.Canonical);
writer.WriteSingle(float.NaN);
Console.WriteLine(Convert.ToHexString(writer.Encode())); // Outputs F9FE00

To reproduce the CborReader issue:

byte[] canonical = Convert.FromHexString("F97E00");
byte[] nonCanonical = Convert.FromHexString("F9FE00");
CborReader canonicalReader = new CborReader(canonical, CborConformanceMode.Canonical);
CborReader nonCanonicalReader = new CborReader(nonCanonical, CborConformanceMode.Canonical);
Console.WriteLine(float.IsNaN(canonicalReader.ReadSingle()));
Console.WriteLine(float.IsNaN(nonCanonicalReader.ReadSingle()));

Expected behavior

CborWriter encodes NaN as 0xf97e00.


CborReader should throw when decoding a NaN that is non-canonically encoded.

Actual behavior

CborWriter incorrectly encodes NaN as 0xF9FE00 when in canonical encoding.


CborReader incorrectly permits all encodings of NaN when in canonical encoding.

Regression?

No.

Known Workarounds

No response

Configuration

Host: Version: 8.0.0-rc.1.23419.4 Architecture: arm64 Commit: 92959931a3 RID: osx-arm64

Using 7.0.0 of System.Formats.Cbor package.

Other information

No response

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-formats-cbor, @bartonjs, @vcsjones See info in area-owners.md if you want to be subscribed.

Issue Details
### Description Since System.Formats.Cbor adheres to RFC 7049, I don't believe `NaN` is handled correctly. The [RFC states](https://www.rfc-editor.org/rfc/rfc7049#section-2.3) for canonical CBOR: > Also, there are many representations for NaN. If NaN is an allowed value, it must always be represented as 0xf97e00. `CborWriter` will encode it as 0xF9FE00. Similarly, `CborReader` will not throw for any non-canonically encoded `NaN`. ### Reproduction Steps To reproduce the `CborWriter` issue: ```C# using System; using System.Formats.Cbor; CborWriter writer = new(CborConformanceMode.Canonical); writer.WriteSingle(float.NaN); Console.WriteLine(Convert.ToHexString(writer.Encode())); // Outputs F9FE00 ``` ----- To reproduce the `CborReader` issue: ```C# byte[] canonical = Convert.FromHexString("F97E00"); byte[] nonCanonical = Convert.FromHexString("F9FE00"); CborReader canonicalReader = new CborReader(canonical, CborConformanceMode.Canonical); CborReader nonCanonicalReader = new CborReader(nonCanonical, CborConformanceMode.Canonical); Console.WriteLine(float.IsNaN(canonicalReader.ReadSingle())); Console.WriteLine(float.IsNaN(nonCanonicalReader.ReadSingle())); ``` ### Expected behavior `CborWriter` encodes `NaN` as 0xf97e00. ---- `CborReader` should throw when decoding a `NaN` that is non-canonically encoded. ### Actual behavior `CborWriter` incorrectly encodes `NaN` as 0xF9FE00 when in canonical encoding. ---- `CborReader` incorrectly permits all encodings of `NaN` when in canonical encoding. ### Regression? No. ### Known Workarounds _No response_ ### Configuration Host: Version: 8.0.0-rc.1.23419.4 Architecture: arm64 Commit: 92959931a3 RID: osx-arm64 Using 7.0.0 of System.Formats.Cbor package. ### Other information _No response_
Author: vcsjones
Assignees: -
Labels: `area-System.Formats.Cbor`
Milestone: -
vcsjones commented 1 year ago

The issue with CborWriter may actually be an issue with the definition of Half.NaN.

I would expect this to print the same thing:

Half h = Half.Zero / Half.Zero;
Console.WriteLine(BitConverter.HalfToInt16Bits(h).ToString("X"));
Console.WriteLine(BitConverter.HalfToInt16Bits(Half.NaN).ToString("X"));

But it prints

7E00
FE00

It seems that Half.NaN is negative NaN:

https://github.com/dotnet/runtime/blob/c84ccfa5d1b6a983d319a4fe9807ed39e810f006/src/libraries/System.Private.CoreLib/src/System/Half.cs#L93

vcsjones commented 1 year ago

@tannergooding do you have any thoughts on the NaN definition of Half here? Should that be PositiveQNaNBits?

bartonjs commented 1 year ago

Since I believe this is limited to System.Single/float, I updated the title. Half, of course, might also be impacted... and if it's also double then I've clearly mistitled the issue :)

vcsjones commented 1 year ago

@bartonjs I believe all of them are impacted since all of them get reduced to a 16-bit NaN, or Half.

CborWriter writer = new(CborConformanceMode.Canonical);
writer.WriteHalf(Half.NaN);
Console.WriteLine(Convert.ToHexString(writer.Encode())); // Writes F9FE00
writer.Reset();
writer.WriteSingle(float.NaN);
Console.WriteLine(Convert.ToHexString(writer.Encode())); // Writes F9FE00
writer.Reset();
writer.WriteDouble(double.NaN);
Console.WriteLine(Convert.ToHexString(writer.Encode())); // Writes F9FE00
writer.Reset();
tannergooding commented 1 year ago

@vcsjones, the "canonical NaN" in hardware is implementation dependent.

For xarch, it has historically been -QNaN and so that's what many systems default to and correspondingly what Half.NaN defaults to as well.

But, the represented value of Half.NaN or float.NaN shouldn't matter. CBOR should simply be doing if (float.IsNaN(x)) { WriteRequiredBitPattern_f97e00(); } (or equivalent for Half/double

bartonjs commented 1 year ago

Huh. I don't know of turning all three into Half.NaN is right. Maybe somewhere says it's allowed. But, https://www.rfc-editor.org/rfc/rfc7049#appendix-A says

Diagnostic Encoded
Infinity 0xf97c00
NaN 0xf97e00
-Infinity 0xf9fc00
Infinity 0xfa7f800000
NaN 0xfa7fc00000
-Infinity 0xfaff800000
Infinity 0xfb7ff0000000000000
NaN 0xfb7ff8000000000000
-Infinity 0xfbfff0000000000000

One presumes those are for "when encoded as a half / float / double"

vcsjones commented 1 year ago

Huh. I don't know of turning all three into Half.NaN is right

I think it is for Canonical CBOR:

If a protocol allows for IEEE floats, then additional canonicalization rules might need to be added. One example rule might be to have all floats start as a 64-bit float, then do a test conversion to a 32-bit float; if the result is the same numeric value, use the shorter value and repeat the process with a test conversion to a 16-bit float. (This rule selects 16-bit float for positive and negative Infinity as well.) Also, there are many representations for NaN. If NaN is an allowed value, it must always be represented as 0xf97e00.

tomeksowi commented 1 year ago

I ran into this issue when working on RISC-V CoreFX tests so I fixed it in PR #92934. Reviews welcome.