amazon-ion / ion-java-benchmark-cli

Apache License 2.0
7 stars 9 forks source link

Generated string length is not aligned with the specified codepoint_length. #32

Closed linlin-s closed 2 years ago

linlin-s commented 2 years ago

31

In this PR, when adding unit test for constraint 'codepoint_length' there is an error thrown.

java.lang.AssertionError: Violations Validation failed:
- invalid codepoint length 4, expected range::[3,3]
found in value "\u555d\ub8bc\U00027546"

The conclusion of the discussion with @desaikd offline is to reconstruct the logic of constructing string method.

 public static String constructString(IonStruct constraintStruct) throws Exception {
        String constructedString;
        Integer codePointsLengthBound;
        String regexPattern = IonSchemaUtilities.parseTextConstraints(constraintStruct, IonSchemaUtilities.KEYWORD_REGEX);
        if (regexPattern != null) {
            RgxGen rgxGen = new RgxGen(regexPattern);
            constructedString = rgxGen.generate();
        } else {
            Random random = new Random();
            codePointsLengthBound = IonSchemaUtilities.parseConstraints(constraintStruct, IonSchemaUtilities.KEYWORD_CODE_POINT_LENGTH);
            // The condition of no codepoint_length constraint in ion schema.
            if (codePointsLengthBound == null) {
                // Preset the bound as average number 20;
                codePointsLengthBound = random.nextInt(20);
            }
            StringBuilder sb = new StringBuilder();
            for (int j = 0; j < codePointsLengthBound; j++) {
                int codePoint;
                int type;
                do {
                    codePoint = random.nextInt(DEFAULT_RANGE.get(1) - DEFAULT_RANGE.get(0) + 1) + DEFAULT_RANGE.get(0);
                    type = Character.getType(codePoint);
                } while (type == Character.PRIVATE_USE || type == Character.SURROGATE || type == Character.UNASSIGNED);
                sb.appendCodePoint(codePoint);
            }
            constructedString = sb.toString();
        }
        return constructedString;
    }
linlin-s commented 2 years ago

After investigation, the reason of failed test should refer this issue: Incorrect validation with CodepointLength#getIntValue()