Closed lmazuel closed 1 month ago
Hey @lmazuel , in the past, we encountered similar issue where a property was defined as a string, but its value was in an integer format. We requested the service team to correct the Swagger spec, as this was identified as a definition bug. Why are we now considering this a feature for TypeSpec?
one use case is representing int64
without loosing precision(for JS) or decimal
Below discussion is on the condition that emitter is to support a general feature.
Please let dev know whether we have such requirement, besides that Batch bug (where backend should send an int, but instead send a string and cannot fix that due to backward-compatibility requirement).
For the Batch bug, I'd recommend Batch service write contentLength: string
in typespec, as they do return it as a JSON string (spec is wrong, and same error on their pre-typespec Swagger). Service could do customization for SDK on the getter of the property.
The full typespec would be
model Foo {
@encode("string", string)
x: int64;
}
Question:
string
for this use case? That is, we only allow client conversion from basic scalar type to string type.
Or do we also allow e.g. @encode("float", float64) x: int32;
We probably want a lint rule to avoid service write arbitrary scalar conversion like @encode("string", url) x: int32;
@encode("string", )
? E.g. on date-time, we write @encode("rfc7231")
, which is very clear, that it is date-time encoded as rfc7231 text.A few proposals, of different scope of support
@encode("string", string) x: int64
and @encode("string", string) decimal
(include decimal128
). Possible reason: certain language may not able to process these types (e.g. JS can only process safeint
which is smaller than int64
). Lint rule (in typespec-azure-core) to disallow all other options.string
, and client type be basic scalar. Possible reason: a float32
can always be represented as a string (however there could be multiple ways of such representation, e.g, "1000.0"
vs "+1000.00"
vs 1,000.0
, and backend must accept all of them as equivalent if the property is used as input), but not vice versa. Downside is that there is not really a RFC like rfc3339 on what string representation is acceptable, what is not -- there is no guarantee a language-specific Float.parseFloat
can handle all possible float-in-string-format.PS: by "basic scalar", I include "numeric types", "string", "boolean" from built-in-types
In the general use cases, most often the numbers we are talking about would fit well inside a double's range and precision.
With above assumption, JS decided to use number to represent decimal
(see decision). So if service team really cares about the precision loss, it would be recommanded to use @encode
(typespec issue).
So talking about encode with int epic, do we have similar scenario where we really care about precision or some other reasons? Actually we have lots of definition for "type": "integer", "format": "int64"
in swagger, but JS just generates them as number
in HLC and it works well. My understanding is data range outside JS could represent may not be common cases.
Back to weidong's proposal, I would prefer the option 1 with only allowing encoding this with necessary types e.g int64 or decimal. I haven't see any values with case @encode("string", string) x: int8
and obviously service bug may not be a good reason. Another point is this may more align with what OpenAPI is trying to support because their format int64 has base type with number and string(link and issue).
Yeah this encoding would only be allowed to target string(in the same way that rfc7231
will complain if you don't target a string)
Additionally same comment as before I don't think string
is the right key. We pretty much always encode as a string. I think "decimal"
might be more accurate(as you could imagine an hex
or octal
encoding as well) but it could be confusing with the decimal
type
@weidongxu-microsoft you're on point that this is contentLength
of Batch ;)
Agree with you all, let's keep things simple and accept only the scenario string/int for now, there is no need to get complicated. And if someone one day comes with another conversion scenario, we can investigate at time the validity of that scenario.
@timotheeguerin are you suggesting @encode("decimal", string)
? Meaning the type on the wire is a string, but the encoding is decimal format meaning a floating point number?
I kind of get this because decimal
narrows the domain of the wire type string
, much like rfc3339
does.
Is there a list of known name of an encoding
as referenced by these docs so we know which values are available to choose from for this use case?
I understood @timotheeguerin as:
@encode("decimal", string)
foo: int32
would convert 12
to "12"
, while
@encode("oct", string)
foo: int32
would convert 12
to "0o14"
IOW, still use the string parameter to describe the format you want it to look like, and usually we use base 10 (or decimal)
Might be confusing to use decimal
as this is usually a floating point number in most languages? But yeah that tracks with what I thought the suggestion was, and I agree with the direction.
No not decimal as floating point decimal as decimal notation of numbers(how you write numbers normally)
But yes that was my point that decimal
might be confusing
Could maybe be
decimal-str
decimal-string
decimal-notation
Also on a related note with encoding we are thinking that using string values it not great and it is better to keep the enum values as the only way for new ones
So could have that(name tbd with above discussion as well)
enum NumericEncoding {
decimal,
hexdecimal,
octal,
binary,
}
let's keep things simple and accept only the scenario string/int for now, there is no need to get complicated. And if someone one day comes with another conversion scenario, we can investigate at time the validity of that scenario.
@lmazuel except int not sure we could scope this a little to consider the decimal
type also? I believe this would be useful for service team who cares precision.
Maybe we could limit this to integer (int32, int64, safeint) to string, for now. There could be some nuance of the "decimal" representation, but may not be too bad.
If we allow float to string, there could be more questions, e.g. whether 2e10
a valid "decimal" for float -- looks not?
Missed the meeting notes here, problem isn't anymore about the decimal
naming as much as deciding what to do all together.
THe current proposal is to have
enum NumericEncoding {
base10,
// With option to add the following in the future
// base16,
// base 8
}
That you can then use
@encode(NumericEncoding.base10)
a: int64;
The current question is
@encode
with no argument to mean encode to string with the default string encoding.
@encode
is when we want to change the default encoding because that type already needed to be encoded to a string implicitly(utcDateTime, plainTime, etc.)Proposed changed signature for decorator
extern dec encode(
target: Scalar | ModelProperty,
encoding: string | EnumMember,
encodedAs?: Scalar
);
extern dec encode2(target: Scalar | ModelProperty, encodedAs: Scalar);
extern dec encode2(
target: Scalar | ModelProperty,
encodingOrEncodedAs: string | EnumMember | Scalar,
encodedAs?: Scalar
);
And then you can call
@encode(string) val: int64
restrict this overload format to only numeric scalar types and string
as the encodeAs
IMHO, I'd avoid @encode(string)
directly, if possible. If we had to, limit it only to a subset of scalar that there is a consensus of what is acceptable as string representative.
I think we should be clear about the conversion rule (it need not to be an RFC, but we should be explicit about what is a valid string representation, what is not.
I am not aware that there is consistency between different languages, when convert string to/from a scalar.
Say convert String to Boolean, when de-serialize the JSON).
Java's Boolean.parseBoolean
handles true
case-insensitively, and everything else is false.
https://docs.oracle.com/javase/8/docs/api/java/lang/Boolean.html#parseBoolean-java.lang.String-
,NET's Boolean.TryParse
only handle True
or False
. Exception if other values.
https://learn.microsoft.com/en-us/dotnet/api/system.boolean.tryparse?view=net-8.0#system-boolean-tryparse(system-string-system-boolean@)
Golang also accept t
f
and 1
0
, case-insensitive. Exception if other values.
https://pkg.go.dev/strconv#ParseBool
We are limiting that to numeric types and it means the decimal representation of the number as if you were to serialize the number to a json string
Yeah, that works. Or we may say that JSON numeric wrapped in quote is acceptable.
Maybe irrelevant to the detailed design. But I'd like to clarify one thing - what does the typespec type stand for?
model Foo {
@format("date-time-rfc1123") data1: string;
@encode(string) val: int64;
}
@format
, we have string
as wire type and emitter could leverage format hint to define client type@encode
, we have int64
as client type and emitter could leverage encode to understand wire type@format
is to be seen as a "known pattern" ( a known string validation format) thats it it doesn't affect the type. Here data1 is a string and should always be presented as a string. There is just some decorator that say you should validate this string is in this format(which our client do not do so basically @format
is not something you should ever be concerned with)
Thanks for the explaination and it makes sense considering we have guideline on no client-side validation. So can I understand typespec type stands for client type or the type we want to show to our customers?
Not sure I see the distinction here client type or the type we want to show to our customers
The typespec type represent the logic type. In some client/server/other target you might not be able to represent it as that logical type so you might choose to keep it as the wire type(e.g. @encode(string) a: int64
in JS)
I think the logic type is what I mean here. One more question - with below example we would expect different logic type if emitter could somehow differentiate the utcDateTime or offsetDateTime, but on the wire they are the same format?
model Foo {
data1: utcDateTime; // RFC3339 for json (default)
@encode(DateTimeKnownEncoding.rfc3339) data2: offsetDateTime;
}
Yeah they both use RFC3339
but data1
should always end with Z
and data2
might have the offset (e.g.+8:00
)
Problemm: We have a service team that requires string on the wire, but this is actually always an integer. Given the simplicity of converting string to int (and vice versa), we want to add support for declaring that conversion, and get emitters supporting it, to avoid manual code for service team cross-language.
Edit:
After discussion, it became clear that the
encode
parameter is an encoding, and therefore shouldn't bestring
, but something describing what the encoding will look like. In the usual case, we would want base-10, which would serialize12
as"12"
. As base-10 is technically "decimal", we could use that term, but it might be conflated with the decimal type. The current consensus seems to benumber
, as a fair semantic of base-10:Original proposal at creation: