Zulu-Inuoe / jzon

A correct and safe(er) JSON RFC 8259 reader/writer with sane defaults.
MIT License
151 stars 14 forks source link

Floating Point Printer #15

Closed Zulu-Inuoe closed 1 year ago

Zulu-Inuoe commented 2 years ago

I've been doing some reading and have settled on implementing the Schubfach method found here:

https://drive.google.com/file/d/1luHhyQF9zKlM8yJ1nebU0OgVYhfC6CBN/view

And to be clear, I am doing this because I want to ensure I never produce an invalid JSON number, as well as keeping the floating point number small. Something I have not had confidence in figuring out with cl:format.

IAmRasputin commented 1 year ago

Could you elaborate on what format is missing for a feature like this?

Zulu-Inuoe commented 1 year ago

I am not well-versed enough in the CL standard to know what combination of format specifiers will ensure I

  1. Always produce a valid JSON number
  2. Produce a number in a way that optimizes for least characters eg instead of 0.0000000001, produce 1e-10

If format can do this then great. It's just an unknown for me

IAmRasputin commented 1 year ago

Thanks.

The more I look into this, the more it seems that floating-point precision is the reader's problem, not format. Lisp reads a number of the form, say, 1.23456 from the input stream, and seemingly does what the Schubfach algorithm describes to model the real as an FP.

If I type a number like 0.000000000000000000000001234 into SLIME, the reader crams it into a single-float (or whatever you have *read-default-float-format* set to). Inspecting it gives:

#<SINGLE-FLOAT {17BEF3C700000019}>
--------------------
Scientific: 1.234e-24
Decoded: 1.0 * 0.74590725 * 2^-79
Digits: 24
Precision: 24

The original ugly number I entered isn't hiding in here anywhere, this is just it. The platonic ideal of the real I entered is just that, represented by this FP. format just reports on it.

Any way I try to return, format, write, etc the original number, it's only the FP now. But it does come back as the same number every time, so that's not nothing.

If it's critical for round-tripping that any horrific decimal be exactly replicated as a string after being read, then I'm not sure about this (saving the string representation feels wrong but I guess it could work). But it seems consistent -- or, at least, consistent with the internal workings of CL.

I really hope I'm not misunderstanding this problem, or your goals with jzon, but TL;DR: format may Just Work here.

Zulu-Inuoe commented 1 year ago

Sorry, I may have given the wrong impression - Round-tripping the JSON is not required.

The roundtripping that is important is the value of the IEEE 754 floating point number. So (= x (jzon:parse (jzon:stringify x))) for every real x. This also means that the string MUST always be a valid JSON number. For example, I cannot use ~A directive for format because of this: (format t "~A" 0.000001023910d0) ; =>1.02391d-6 and 1.02391d-6 is not a valid JSON number because CL uses d here instead of e

And secondly I'd like jzon to produce an efficient representation for floats meaning print the minimum number of characters required to represent the value - eg switch to scientific notation when appropriate.