Add eNotation and Base64 formatting for float* types

jpwhitemn commented 5 years ago

Using the example simpledriver in the device-service-go, I am able to use all the NewXValue methods to submit the various types of data except floats. When I use the two NewFloatXValue methods, the values are somehow being binary encoded to that what gets submitted is a random string (examp;e: "value":"PqRObw==")

cloudxxx8 commented 5 years ago

@jpwhitemn that is not a random string, but a base64 encoding of floating value. We implemented it in #24 It is the decision from the previous F2F meeting for avoiding losing precision. Also, this requirement is in the document: https://docs.google.com/document/d/1aMIQ0kb46VE5eeCpDlaTg8PP29-DBSBTlgeWrv6LuYk/edit

jpwhitemn commented 5 years ago

OK - fine with the decision if it is made consciously and for good reason.

cloudxxx8 commented 5 years ago

Reopened this issue according to the discussion in the WG meeting

jduranf commented 5 years ago

I agree with the idea of using a more user friendly format. It's hard to explain and understand the current format.

anonymouse64 commented 5 years ago

What is hard to explain about the current format? I think that it's fairly easy to explain that the current format is the floating point number represented in base64 and all one has to do to decode it is use a base64 decoder.

I think that we should continue to use the base64 representation for floating point numbers as it's very clear (once it's been documented and one reads the documentation of course), maintains perfect precision and has simple implementations for anything importing or exporting floating point numbers in EdgeX.

cloudxxx8 commented 5 years ago

@anonymouse64 @iain-anderson @jpwhitemn @tonyespy @SteveOss @jduranf I have to admit it's new to me to learn base 64 encoding, and I didn't notice it until @tonyespy explained to me. It looks like it's necessary for industrial IoT software to preserve the precision, but for the case which doesn't need the precision, the simple text format is easier to use.

As @iain-anderson mentioned in the WG meeting this week, there is a compromise way. We have a field called Precisionin https://github.com/edgexfoundry/edgex-go/blob/master/pkg/models/propertyvalue.go#L58, and we are not using it in the code.

May we have a compromised resolution? If the Precision is larger than 4, such as 3.14159, the Device Service SDK handles the value via base 64 encoding/decoding. If the Precision is equal or smaller than 4, such as 3.14, we could handle it as the simple text format.

Is it acceptable?

iain-anderson commented 5 years ago

I would say on the contrary, if the device profile specifies a particular number of digits, we should output with that number. If we made Precision more of a general format specifier, we could implement other behaviours - for example output in hexadecimal floating point, base64 encoded binary or whatever. Edit: but that would have implications on any service working with the data

cloudxxx8 commented 5 years ago

Sounds good to me, identify some enumeration value to indicate the behavior

iain-anderson commented 5 years ago

I just don't see that base64 solves a problem here. Yes, it produces shorter output than the decimal strings required for exact transmission, and it's probably faster to compute. But we are working with JSON which is considerably more bulky and more work to generate and parse than an all-binary format anyway. We are doing this so that you can drive the APIs using generic tooling like Postman and curl, and the results come out in a human-readable format. base64 breaks this.

SteveOss commented 5 years ago

A suggested compromise is to add a new attribute to property values to set the required encoding for floating point types string or base64.

anonymouse64 commented 5 years ago

The problem that base64 solves is a guarantee that you don't lose precision through import/export of floating point numbers. This is important for scientific computing and industrial use cases. Additionally there is a performance gain to be had here in simply converting the binary representation into ASCII values rather than performing all the math ops required to turn a binary representation into a human-readable one (and vice-versa).

I'm not opposed to having a developer option to turn off base64 encoding/decoding to make it easy to write tests and things, but I think having both options of a decimal format and base64 simultaneously will needlessly complicate things and make it too easy to get it wrong and accidentally lose precision.

While it is true that you can achieve lossless encoding of floating point numbers using decimal strings, it requires careful coding and is easy to mess up, which is why base64 is preferable because you don't lose precision no matter what you do.

Additionally, we have already made a release that uses base64. We cannot remove this from the default behavior without also introducing a breaking change. If we want to add another type then that's fine but it shouldn't replace the default without strong reasoning for that.

SteveOss commented 5 years ago

These are a number issues with base 64 encoding:

1 - Means floating point values in a JSON encoding cannot be read by a user or easily set by a user from a REST call. Multiple users have complained about this. These is an assumption that JSON should be man -readable. 2 - Most floating point edge data from devices has pretty low precision in the first place (what is the precision of a temperature reading from a sensor?), so in most real life cases absolute precision in not an actual requirement. 3 - As long as enough digits are used, a string representation will not loose precision. 4 - Base 64 encoding is not endian safe

jpwhitemn commented 5 years ago

OK - so I am seeing a potential compromise in what Ian has suggested. What about having a configuration option (something like floatPrecision=true or false) that allows developers to choose whether floats are binary 64 encoded or sent non-encoded with potential loss of precision? Based on that config option, the SDKs (and DS that come from them) add the reading either encoded or not? Everyone cool with this option??

JamesKButcher commented 5 years ago

Coming from a user rather than developer perspective I echo the requirement for readability of the data when in its JSON format. Agree with Jim's floatPrecision option.

anonymouse64 commented 5 years ago

These is an assumption that JSON should be man -readable.

I don't necessarily agree that using JSON means everything has to be human-readable. We are transporting floating point numbers which are binary in nature. Additionally, it's pretty easy to decode the value if you want to read the value, any number of online converters are available, or command line tools, etc.

2 - Most floating point edge data from devices has pretty low precision in the first place (what is the precision of a temperature reading from a sensor?), so in most real life cases absolute precision in not an actual requirement.

That doesn't mean that we should ignore use cases that do exist for this precision. A counter-example I gave previously was an industrial radiation sensor which does need full precision 32-bit floating point numbers.

3 - As long as enough digits are used, a string representation will not loose precision.

Yes there are ways if you're careful to ensure you don't lose precision by using a proper format specifier, but you have to be careful to ensure you do this everywhere and when you take input from a user you won't know what precision it was at originally and end up introducing drift where there shouldn't be.

SteveOss commented 5 years ago

What about the endian issue? Do we have a solution for this, if not we have a problem.

cloudxxx8 commented 5 years ago

I agree with making it configurable to match different kinds of use case. Since it has been released, we could make base64 as the default setting. It might add some complexity, but get more flexibility.

SteveOss commented 5 years ago

@cloudxxx8 Am fine with this approach as long as we have a solution for the base64 endian issue ... ?

jduranf commented 5 years ago

Just some comments:

In core-data, if ValidateCheck is enabled, checks the values provided by the device services. This check expects that the float value is an string
In core-data, the value descriptor type field for a float is expected to be a "F", but in device-sdk is expected to be a "float32" or "float64". This issue affects all data types.
In my opinion, the default setting should be an string (most users will expect this format)
The base64 endian issue can be resolved if is defined to use big endian (or little endian) for base64 encoding

SteveOss commented 5 years ago

@jduranf WRT to your comments. Setting a fixed endian for base64 encoding is a solution (suggest little endian as is most widely used), but anyone doing a base64 encode/decode using a system library on a big endian system is going to come a cropper. Agree that default encoding should be string not base64. Clearly core data needs aligning with changes to device services.

cloudxxx8 commented 5 years ago

@jduranf if core-data expects the data type of value descriptor "F" for float, other data type might also be not as the expected values, such as "I", "S", or "B", right?

jduranf commented 5 years ago

yes @cloudxxx8, all other data doesn't have the expected values. Also several former device-profile examples use "Float" or "Int" as data type. Is documented which are the valid data types? I couldn't find that...

cloudxxx8 commented 5 years ago

@jduranf I can't find any document about the data type of device resource in device profile and value descriptor, either. In the previous version, we had Java enum to restrict those data type value in source code.

tonyespy commented 5 years ago

@cloudxxx8 Please refer to the device resources requirements document (v8) (posted on the Architecture Issues and Decisions page on our wiki). The type system was totally re-vamped during our California release. The original type system had many issues, and this new design was reviewed and approved.

That said, we re-hashed this once again during today's Core WG meeting. The decision was that we'll add a new attribute (strawman name "floatEncoding" to the ValueProperty & ValueDescriptor, which if specified as "printfFloat" (other suggestions welcome, but this is the most accurate I could come up with), then floats will be encoded using the C printf style encoding (e.g. "3.14159265e+00"). If this new attribute isn't specified, then the default will be base64 as currently implemented.

Also to ensure precision, we can either decide to use a different number of digits for float32 vs. float64 (eg "%1.8e" vs. "1.16e"), or just always use the latter for all floats.

We should re-title this issue and add an Edinburgh tag.

tonyespy commented 5 years ago

Some disagreement still exists on what the default should be. This will be decided at the next WG meeting.

tonyespy commented 5 years ago

Note - the device-sdk-c only implements the C float format, not base64 encoding.

SteveOss commented 5 years ago

Just a heads up that we will be voting today on the device services working group as to what the default encoding should be for floating point values. One vote per contributing company, either "base64" or "string". If you can't make the meeting you can e-mail me your vote.

iain-anderson commented 5 years ago

Suggest "eNotation" to specify conventional float format. https://en.wikipedia.org/wiki/Scientific_notation#E-notation

cloudxxx8 commented 5 years ago

@tonyespy @SteveOss Please help confirm whether we need to add "floatEncoding" to the ValueProperty & ValueDescriptor. In my opinion, an additional field is overkilled because it is useless when the field is not floating type. If yes, we have to open a new issue to go-mod-core-contracts, and we might need some help from @tsconn23. Or, a simple property in configuration can be easily achieved

cloudxxx8 commented 5 years ago

Sorry, I just noticed we had decision in the working group this week. I didn't catch it. I just opened an issue to go-mod-core-contracts https://github.com/edgexfoundry/go-mod-core-contracts/issues/22

tobiasmo1 commented 5 years ago

Usability to be improved by assigning default encoding configuration to DS SDK, influenced by System Management WG.

cloudxxx8 commented 5 years ago

PR https://github.com/edgexfoundry/go-mod-core-contracts/pull/38 opened

edgexfoundry / device-sdk-go

Add eNotation and Base64 formatting for float* types #107