Currently every local database record of a domain object has an equal chance of being selected as a dependency provided it meets any additional criteria. For example, every Account record with an "account_type" of "depot" in the database accounts table has an equal chance of being randomly chosen to have it's account_id attribute referenced in the depot_id attribute of a Depot Position domain object record.
It would be desirable to allow users to weight the distribution of dependencies in some way such that in the example above some accounts are referenced by significantly larger numbers of depot positions than others. For example, one depot-type account may be referenced by 500 depot positions, and another by say none at all.
Design
The user-facing config.json shoyuld be amended to allow users to specify the distribution of dependencies in some way.
A simple way of doing this might be to have a rule that applies to all dependencies across the board of the format:
A specific X% of dependant objects will be used Y% of the time, and the other (100-X)% will be used the other (100-Y)% of the time, where X and Y are both integers.
Some pseudo code to demonstrate a rough implementation:
Rule: 20% of dependent objects will be used 70% of the time, and the other 80% will be used the other 30% of the time
if random.randint(1,100) <= 20:
# randomly select from the first 70% of suitable domain objects
# this can be done by constructing an appropriate query
# for example, a subquery could be nested in the main query to calculate 70% of the total number of rows in the table of appropriate records, and only that number will be retrieved to select from
else:
# randomly select from the last 30% of suitable domain objects
# construct appropriate query as described above
Note that the above is just one way of quantifying and implementing this idea, and any other reasonable solution would be ok.
TESTING
The nature of the testing will depend on the form of implementation, but a test should be added for each dependent attribute of a domain object to check the distribution is as expected. Rather than asserting a True/False statement, it might make more sense to sum how often each dependency attribute is referenced and seeing if that matches what would be expected.
Documentation Changes
Method docstrings and project readme should be updated to explain new functionality and guide user in setting config parameters
Test Evidence
Testing methodology should be implemented and should indicate that distribution weighting works as expected. All existing tests should still pass as expected.
Validation in Develop
Output from running python src/app.py should be as expected
Issue Description
Currently every local database record of a domain object has an equal chance of being selected as a dependency provided it meets any additional criteria. For example, every
Account
record with an"account_type"
of"depot"
in the database accounts table has an equal chance of being randomly chosen to have it'saccount_id
attribute referenced in thedepot_id
attribute of aDepot Position
domain object record.It would be desirable to allow users to weight the distribution of dependencies in some way such that in the example above some accounts are referenced by significantly larger numbers of depot positions than others. For example, one depot-type account may be referenced by 500 depot positions, and another by say none at all.
Design
The user-facing
config.json
shoyuld be amended to allow users to specify the distribution of dependencies in some way.A simple way of doing this might be to have a rule that applies to all dependencies across the board of the format:
A specific X% of dependant objects will be used Y% of the time, and the other (100-X)% will be used the other (100-Y)% of the time, where X and Y are both integers.
Some pseudo code to demonstrate a rough implementation:
Rule: 20% of dependent objects will be used 70% of the time, and the other 80% will be used the other 30% of the time
Note that the above is just one way of quantifying and implementing this idea, and any other reasonable solution would be ok.
TESTING
The nature of the testing will depend on the form of implementation, but a test should be added for each dependent attribute of a domain object to check the distribution is as expected. Rather than asserting a True/False statement, it might make more sense to sum how often each dependency attribute is referenced and seeing if that matches what would be expected.
Documentation Changes
Method docstrings and project readme should be updated to explain new functionality and guide user in setting config parameters
Test Evidence
Testing methodology should be implemented and should indicate that distribution weighting works as expected. All existing tests should still pass as expected.
Validation in Develop
Output from running
python src/app.py
should be as expected