artemis-analytics / artemis

Apache License 2.0
5 stars 2 forks source link

Provider classes for statistical distributions with parameters #16

Open ryanmwhitephd opened 4 years ago

ryanmwhitephd commented 4 years ago

@russellgill This issue is to discuss the recent branch feature-generators-distributions.

Does the branch include fixes to the handling of parameters in the synthesizer class initialization? If so, have these changes been validated? Integrated tests with SimuTable and Artemis can be found in tests/test_simutable.py. Currently, there is no dedicated testing module for the provider classes.

Todo

ryanmwhitephd commented 4 years ago

I just tried to test the branch. I get an import error: cannot import name 'binom' from 'numpy.random'

Similar error to chi2. I've made a few fixes for these errors.

Testing fails with the binomial function, but now we can have a test suite for these. Once we have all the custom providers tested, we can move to integrating them into the test_simutable. That will test the storing and loading parameters from the metadata via the synthesizer class.

I'll see if I can make a few more patches later today.

ryanmwhitephd commented 4 years ago

@russellgill I made a few more fixes to get the tests running. However ...

All the data must be from the same seed, and hence from the same RNG. This may be an old issue that I did not deal with. The RNG is from faker, so we need to be able to switch out the RNG in faker.

I think Faker just uses the python random module which doesn't have much compared to SciPy.

self.generator.random is an instance of python Random: https://docs.python.org/2/library/random.html

this RNG needs to be used in NumPy or SciPy calls or vice-versa

ryanmwhitephd commented 4 years ago

Looks like we can set the RandomState: https://stackoverflow.com/questions/16016959/scipy-stats-seed

I may have done this elsewhere... https://github.com/ryanmwhitephd/artemis/blob/05688f3b2bf81be511eb1f6d7c2d1173d77450c7/artemis/generators/common.py#L68

And the way to make sure this is done correctly is if the seed is set, everything generated will be the same each time. If one value is different then a different RNG seed was used.

russellgill commented 4 years ago

The normal function is an unmodified version of the one in the initial program's files. I just moved it into a new folder.