amazon-ion / ion-java-benchmark-cli

Apache License 2.0
7 stars 9 forks source link

Adds internal default constraints in Ion Data Generator. #39

Closed linlin-s closed 2 years ago

linlin-s commented 2 years ago

In order to generate data which is more similar to the real-world data, users are required to provide highly-specified schema. It would be better to provide a set of default constraints internally which are able to be overridden by the users. i.e. Input type definition:

 type::{
       name: NestedList,
       type: list,
        ordered_elements: [
            string,
            int,
            {type: int, occurs: range::[0, 10] },
            {type: struct,
                fields: {
                first_name: string,
                last_name: string,
                last_updated: { type: timestamp, timestamp_precision: year, occurs: range:: [2, 5]},
                }
            },
        ]
  }

Generated data:

[
  "𢋎썾䛰𡁜𦓱𓂙緻𧕊赀𤀿딟𨡠𤄧𣀜𝝔鸬𢙕ࠟ",  1257713305652284209,  -5644910875031391494,  -7360934801865493629,  2015045502948171567,  -3426578700639114343,  4799305136895312052,  {
    last_updated: 7238T,    last_updated: 4437T,    last_updated: 5424T,    last_updated: 9091T,    last_name: "𠯽𨓐𩑩熌𥙨峾鴊𠎖𪡤𦜤ה𠝹ቕ",    first_name: "浃␄輬썂𣧱븅"  }
]

From the example, the generated data are not widely used in real world. In that case, it might be better to have default constraints to specify the generated data and make the data more practical.