finos / datahelix

The DataHelix generator allows you to quickly create data, based on a JSON profile that defines fields and the relationships between them, for the purpose of testing and validation
https://finos.github.io/datahelix/
Apache License 2.0
142 stars 50 forks source link

Contradictory weightedInSet profiles are valid #1710

Closed Tom-hayden closed 3 years ago

Tom-hayden commented 4 years ago

Bug report

Description of the bug

Currently a field can contain two inset constraints both which specify weightings. However this is a contradiction and should be marked as one.

Related to https://github.com/finos/datahelix/issues/1705

Steps to reproduce:

Using the profile:

{
    "fields": [
        {
            "name": "contradictory",
            "type": "integer"
        },
        {
            "name": "nonContradictory",
            "type": "integer"
        }
    ],
    "constraints": [
        {
            "field": "contradictory",
            "inSet": "weightedSet1.csv"
        },
        {
            "field": "contradictory",
            "inSet": "weightedSet2.csv"
        },
        {
            "field": "nonContradictory",
            "inSet": "weightedSet.csv"
        },
        {
            "field": "nonContradictory",
            "inSet": "nonWeightedSet.csv"
        }
    ]
}

and the csv files

weightedSet1.csv:

1,200
2,1
3,1
4,10

weightedSet2.csv:

0,50
1,1
2,1
3,1

nonWeightedSet.csv

2
3
4

Run the profile with the command generate --max-rows=1000 --replace --profile-file=D:\path\to\datahelix\profile.json --output-path=D:\path\to\datahelix\out.csv --output-format=csv --generation-type=RANDOM --set-from-file-directory=D:\path\to\datahelix\

Expected result:

What did you expect to happen? The profile is marked as fully contradictory and no data is produced.

Actual result:

Data is produced.

Additional Context

As part of this fix, it would be useful to separate out InSetConstraint into InWeightedSetConstraint or something similar. They are two distinct constraints and should be treated as such in the code base

Tom-hayden commented 4 years ago

I had a look at solving this but it looks like I will not have time to finish this bug at this point in time. My approach consisted of updating inSetConstraintDto to include the isWeightedSet field and to populate it when the set was being read in the CsvFileInputReader.

Once this is done, it is possible to create either a inSetConstraint or an inWeightedSetConstraint as necessary

Adding an additional UniformList that extends DistributedList might also be a good idea.