ballerina-platform / ballerina-library

The Ballerina Library
https://ballerina.io/learn/api-docs/ballerina/
Apache License 2.0
136 stars 64 forks source link

Proposal: Ballerina Constraint Package #2850

Open ldclakmal opened 2 years ago

ldclakmal commented 2 years ago

Summary

Ballerina Constraint package will provide features to validate the values that have been assigned to Ballerina types. This proposal is to introduce the new package that supports for the validation.

Goals

Introduce a new standard library package which has APIs to validate the values that have been assigned to Ballerina types.

Motivation

Right now, the values assigned to Ballerina types cannot be validated further. As an example, according to the definition of int type in Ballerina specification:

The int type consists of integers between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807 (i.e. signed integers than can fit into 64 bits using a two's complement representation).

It cannot be further constrained as the user wishes. As an example, the age of the Person cannot be validated for a positive integer. Likewise, there is no way to constraint the values assigned to Ballerina types as of now. With this proposed package, that can be done with the use of an annotation which is binded to the type.

Also, this support is available in the other language specification such as XML Schema Part 2, JSON schema validation, OpenAPI specification and JSR 303.

Description

The XML Schema Part 2, JSON schema validation, OpenAPI specification and JSR 303 considered as references for designing this package. The highlighted validation rules/keywords are used for the proposed design for Ballerina.

Constraints of XML Schema

type validation rule
string length, minLength, maxLength, pattern, enumeration, whiteSpace
boolean pattern, whiteSpace
float, double pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
decimal totalDigits, fractionDigits, pattern, whiteSpace, enumeration, maxInclusive, maxExclusive, minInclusive, minExclusive
duration, dateTime, time, date, gYearMonth, gYear, gMonthDay, gDay, gMonth pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive
hexBinary, base64Binary, anyURI, QName, NOTATION length, minLength, maxLength, pattern, enumeration, whiteSpace

Example:

<simpleType name='password-string'>
 <restriction base='string'>
   <minLength value='8'/>
   <maxLength value='12'/>
 </restriction>
</simpleType>

References:

Constraints of OpenAPI Specification

type validation keyword values for format keyword
integer minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf int32, int64
number minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf float, double
string minLength, maxLength, pattern byte, binary, date, date-time, password
array minItems, maxItems, uniqueItems -
object minProperties, maxProperties -
boolean - -

Example:

components:
  schema:
    type: string
    minLength: 8
    maxLength: 12
    format: password

References:

Constraints of JSON Schema

type validation keyword values for format keyword
integer minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf -
number minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf -
string minLength, maxLength, pattern date, date-time, time, duration, email, idn-email, hostname, idn-hostname, ipv4, ipv6, uri, uri-reference, iri, iri-reference, uuid, uri-template, json-pointer, relative-json-pointer, regex
array minItems, maxItems, uniqueItems, maxContains, minContains -
object minProperties, maxProperties, required, dependentRequired -
boolean - -

Example:

{
   "type": "string",
   "minLength": 8,
   "maxLength": 12,
   "pattern": "^(?=.*[A-Za-z])(?=.*\\d)[A-Za-z\\d]{8,12}$"
}

References:

Constraints of Java

NOTE: A whitespace have been added between @ symbol and the constraint name, in order to remove tagging GitHub users and organizations.

@ Null @ NotNull @ AssertTrue @ AssertFalse @ Min @ Max @ DecimalMin @ DecimalMax @ Negative @ NegativeOrZero @ Positive @ PositiveOrZero @ Size @ Digits @ Past @ PastOrPresent @ Future @ FutureOrPresent @ Pattern @ NotEmpty @ NotBlank @ Email

public class User {
    private String name;

    @Min(value = 18, message = "Age should not be less than 18")
    private int age;

    @Email(message = "Email should be valid")
    private String email;

    // standard setters and getters 
}

References:

Proposed Constraints for Ballerina

The following constraints are proposed for Ballerina.

Constraint name Applies to type Constraint value type Semantics (v is value being constrained, c is constraint value)
minValue any ordered type T T v >= c
maxValue v <= c
minValueExclusive v > c
maxValueExclusive v < c
multipleOf int, decimal int, decimal v % c = 0
length string, xml, table, list, map int v.length() == c
minLength v.length() >= c
maxLength v.length() <= c
uniqueMembers anydata[], map<anydata> boolean for any value k & k' in v, v[k] != v[k']
pattern string regexp v matches c (need to decide whether match is anchored or not)
schemaValid xml SchemaValid record (defined below) v must be valid according to an XSD schema as described by the SchemaValid record c
fractionDigits decimal int v must have not more than c fraction digits
oneOf mapping string[][] protobuf oneof semantics; [["a", "b"], ["c", "d"]] allowed when a, b, c, d are optional fields; must have exactly one of a and b, and exactly one of c and d
dependentRequired mapping map<string[]> if field k is present, then all fields in v[k] must be present

SchemaValid record definition used for schemaValid constrain of above table:

type SchemaValid record {|
     // top-level can contain pi,comment, whitespace before/after element
     boolean document = true;
     // "{ns}localName" (works with xmlns declaration)
     string elementName;
     map<string> schemaLocation?;
     string noNamespaceSchemaLocation?;
|};

Proposed APIs

The ballerina/constraint package provides different annotations for different basic types e.g. @constraint:String for strings, @constraint:Map for maps etc. each of these will define a separate associated record type. These annotations are attached to the type or record field attachment points.

Annotation

public annotation IntConstraints Int on type, record field;
public annotation FloatConstraints Float on type, record field;
public annotation NumberConstraints Number on type, record field;
public annotation StringConstraints String on type, record field;
public annotation ArrayConstraints Array on type, record field;
// ... rest of the annotation definitions

Associated Record Types

type IntConstraints record {|
   int minValue?;
   int maxValue?;
   int minValueExclusive?;
   int maxValueExclusive?;
   // ... all the finalized constraints for int type should go here
|};

type FloatConstraints record {|
   float minValue?;
   float maxValue?;
   float minValueExclusive?;
   float maxValueExclusive?;
   // ... all the finalized constraints for float type should go here
|};

type NumberConstraints record {|
   decimal minValue?;
   decimal maxValue?;
   decimal minValueExclusive?;
   decimal maxValueExclusive?;
   // ... all the finalized constraints for decimal type should go here
|};

type StringConstraints record {|
   int length?;
   int minLength?;
   int maxLength?;
   string pattern?;
   // ... all the finalized constraints for string type should go here
|};

type ArrayConstraints record {|
   int length?;
   int minLength?;
   int maxLength?;
   // ... all the finalized constraints for any[] type should go here
|};

// ... rest of the associated record types

Annotation Mappings

type annotation
int @constraint:Int
float @constraint:float
int|float|decimal @constraint:Number
string @constraint:String
any[] @constraint:Array
... ...

Function

The package has the public function that the developer is expected to call with the value that needs to be validated along with its type descriptor. Returns typedesc<anydata> if the validation is successful, or else an error if the validation is unsuccessful or if there is an issue with the constraint value.

public function validate(anydata v, typedesc<anydata> td = <>) returns td|error {
   // ...
}

NOTE: In general the constraint checker code will need to do some checking on the annotation with the attached basic data type. It won't all be done declaratively by the annotation mechanism.

Examples

import ballerina/constraint;
import ballerina/log;

type Person record {|
    string name;
    @constraint:Int {
        minValue: 18
    }
    int age;
    @constraint:String {
        pattern: "^[\\w-\\.]+@([\\w-]+\\.)+[\\w-]{2,4}$"
    }
    string email;
|};

public function main() {
    Person person = {name: "Chanaka", age: 16, email: "chanakal@wso2.com"}; 
    Person|error validation = constraint:validate(person);
    if validation is error {
        log:printError("Failed to validate person details", validation);
    }
    // business logic
}

Related issue: https://github.com/ballerina-platform/ballerina-standard-library/issues/2788

jclark commented 2 years ago

Ordered type is defined here: https://ballerina.io/spec/lang/2022R1/#ordering. It cannot be described by a Ballerina type definition. OrderedType is an approximation: any Ballerina ordered type will be an OrderedType.

jclark commented 2 years ago

uniqueItems only makes sense when member type is subtype of anydata, i.e. applicable only to subtype of map<anydata> or anydata[]: Items is not the right word: Ballerina terminology is members.

jclark commented 2 years ago

I don't understand what minContains and maxContains mean.

Is there a real use case for multipleOf?

This syntax is wrong:

@constrain {
        minValue = 18
    }

Needs a colon not an equals.

jclark commented 2 years ago

Constraints defined in the Constraints record is valid only for some types. Validity of constraints for a particular type have to be validated separately. How can it be done?

You might be able to handle some cases with a constraint on the annotation record type.

An annotation is a qualified name. So you can have a constrain module that provides different annotations for different basic types e.g. @constrain:String for strings, @constrain:Map for maps etc. each of these will define a separate associated record type.

But in general the constraint checker code will need to do some checking. It won't all be done declaratively by the annotation mechanism.

jclark commented 2 years ago

Hence, the constraint value type of pattern constraint is extended to regexp or predefined formats. Predefined formats have a regexp assigned internally.

This is a terrible idea. They are totally different things.

format is not a constraint in the way all these other things are.

jclark commented 2 years ago

This is wrong pattern = "^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$": you would need to use \\w not \w.

ldclakmal commented 2 years ago

uniqueItems only makes sense when member type is subtype of anydata, i.e. applicable only to subtype of map<anydata> or anydata[]: Items is not the right word: Ballerina terminology is members.

Updated the proposal for the applicable types with the correct terminology.

ldclakmal commented 2 years ago

I don't understand what minContains and maxContains mean.

These 2 constraints were added with the idea of validating the minimum and the maximum value that an any ordered type array or map can have.

Is there a real use case for multipleOf?

// Cash withdrawal from an ATM
type CashWithdrawal record {|
    string currency;
    @constrain {
         multipleOf: 100
    }
    int amount;
|};

This syntax is wrong:

@constrain {
        minValue = 18
    }

Needs a colon not an equals.

Updated the proposal.

ldclakmal commented 2 years ago

This is wrong pattern = "^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$": you would need to use \\w not \w.

Updated the proposal.

ldclakmal commented 2 years ago

Hence, the constraint value type of pattern constraint is extended to regexp or predefined formats. Predefined formats have a regexp assigned internally.

This is a terrible idea. They are totally different things.

format is not a constraint in the way all these other things are.

Agree. It acts as a metadata / hint of the content. So, we will remove this format thing from the proposal and revisit that later. Updated the proposal.

ldclakmal commented 2 years ago

Constraints defined in the Constraints record is valid only for some types. Validity of constraints for a particular type have to be validated separately. How can it be done?

You might be able to handle some cases with a constraint on the annotation record type.

An annotation is a qualified name. So you can have a constrain module that provides different annotations for different basic types e.g. @constrain:String for strings, @constrain:Map for maps etc. each of these will define a separate associated record type.

But in general the constraint checker code will need to do some checking. It won't all be done declaratively by the annotation mechanism.

This is much better than the proposed method IMO. As explained, we can have different annotations for different basic types where each of these will define a separate associated record type.

public annotation StringConstraints String on type, record field;
public annotation IntConstraints Int on type, record field;
...
type StringConstraints record {|
   int length?;
   int minLength?;
   int maxLength?;
   string pattern?;
   // ... all the finalized constraints for string type should go here
|};

type IntConstraints record {|
   int minValue?;
   int maxValue?;
   int minValueExclusive?;
   int maxValueExclusive?;
   // ... all the finalized constraints for int type should go here
|};

@jclark Since package name is ballerina/constraint, shouldn't the annotation-tag be like @constraint:String/@constraint:Map instead of @constrain:String/@constrain:Map?

If so, the example would be as follows:

import ballerina/constraint;
import ballerina/log;

type Person record {|
    string name;
    @constraint:Int {
        minValue: 18
    }
    int age;
    @constraint:String {
        pattern: "^[\\w-\\.]+@([\\w-]+\\.)+[\\w-]{2,4}$"
    }
    string email;
|};

public function main() {
    Person person = {name: "Chanaka", age: 16, email: "chanakal@wso2.com"}; 
    error? validation = constraint:validate(person);
    if validation is error {
        log:printError("Failed to validate person details", validation);
    }
    // business logic
}
shafreenAnfar commented 2 years ago

@sameerajayasoma do let us know if you have any feedback.

ldclakmal commented 2 years ago

We had a meeting today with @jclark to discuss about the proposal and decided followings:

Updated the proposal according to these.

jclark commented 2 years ago

I thought we were going to have e.g. constraint:Int and constraint:String. Numbers require some subtlety to deal with the case where a value might allow a union:

decimal is special because it includes the range of the other numeric types, and it represents floating point decimal values exactly.

multipleOf makes sense for only int and decimal.

ldclakmal commented 2 years ago

Yes. That is true. I meant to express the same but sorry for not explaining it properly. Thanks for the clarification. Updated my last comment to be less ambiguous.

ldclakmal commented 2 years ago

@jclark The proposed API works successfully with the record field annotation attachment point, but there is an issue with retrieving the annotations form the type annotation attachment point.

Case 1: record field annotation attachment point

import ballerina/constraint;
import ballerina/log;

type Foo record {
    @constraint:String {
        length: 6
    }
    string value;
};

public function main() {
    Foo foo = {value: "s3cr3t"};
    error? validation = constraint:validate(foo);
    if validation is error {
        log:printError("Failed to validate details", validation);
    }
    // business logic
}

Case 2: type annotation attachment point.

import ballerina/constraint;
import ballerina/log;

@constraint:Int { minValue: 0 }
type PositiveInt int;

public function main() {
    PositiveInt age = 18;
    error? validation = constraint:validate(age);
    if validation is error {
        log:printError("Failed to validate age", validation);
    }
    // business logic
}

In case-2, there is no way to get the annotations attached to the PositiveInt type in the runtime. Because, in runtime, the value (which is 18) is passed to the validate function and the type is resolved with inherent type (which is int in this case). Therefore, we might have to come up a different API to solve this.

jclark commented 2 years ago

I agree the API isn't right.

Does it really make sense to get the constraints from the value being validated? I don't think so. I think you should pass in the typedesc to be used for validation as a parameter, something like:

function validate(anydata value, typedesc<anydata> t = <>) returns t|error;

e.g.

Foo foo = {value: "s3cr3t"};
Foo validFoo = check constraint:validate(foo);
ldclakmal commented 2 years ago

Thanks @jclark. This API is better and solve the problem we have. Also, it works successfully with the both cases mentioned above.

sameerajayasoma commented 2 years ago

I would not return the validated value cg

sameerajayasoma commented 2 years ago

I understand why we had to design the validate method to return the value it validates.

But I would use the following signature for the validation method. I know that we don't have the syntax to specify some typedesc values.

function validate(anydata value, typedesc<anydata> t) returns error?;

e.g.,

@constraint:Int { minValue: 0 }
type PositiveInt int;

public function main() returns error?{
    PositiveInt age = 18;
    check constraint:validate(age, PositiveInt);
    // ...
}
jclark commented 2 years ago

My experience of writing Ballerina code is that it works better to have a function validFoo that returns Foo|error rather than validateFoo that returns error?. The former works better with an expression-oriented, functional style of programming, which tries to do more work in each expression, rather than a procedural style, which breaks things down into lots of little statements. Think of validFoo as a function that given a possibly invalid Foo gives you a known valid Foo.

Your example is not realistic: you almost always want to do something with the value you have validated (otherwise why would you validate it?).

Apart from that, this approach nicely allows the typedesc value to be defaulted.

TharmiganK commented 2 years ago

Regarding the regexp match for string types, we would go with the name pattern for the constraint since the same name is used in the references mentioned in the proposal.

Sample :

string:RegExp regExp = re `([0-9]{10})|(\+[0-9]{11})`;

@constraint:String {pattern: regExp}
type PhoneNumber string;

type User record {|
    string name;
    @constraint:String {pattern: re `male|female`}
    string gender?;
    int age;
    @constraint:String {pattern: re `([0-9]{9}[v|V]|[0-9]{12})`}
    string nic;
|};

type UserAdvanced record {|
    *User;
    PhoneNumber contactNumber;
    @constraint:String {pattern: re `([a-zA-Z0-9._%\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,6})*`}
    string email;
|};