Closed Eh2406 closed 7 years ago
Thanks for catching this @Eh2406! Since you guys are probably most recently immersed in this code, do you have a sense for which of the two solution routes you mention seems most promising? If you think you're in a position to issue a pull request for this issue, go for it. I'll probably have capacity to dig into this too on Friday or early next week
I'd lean toward removing h_constraint.index = h_jd.cat_id
and p_constraint.index = p_jd.cat_id
I could not figure out why it is necessary, but have not yet tried. @fscottfoti added it and probably had a good reason. alsw we need to check whether this is a user facing change.
I'd definitely add a assert self.ncols == len(person_freq.columns) + len(household_freq.columns)
to and a test to ensure it does not sneak back.
I'd guess it is necessary but I sure can't remember at this point. Also, why do you say it requires the index to be sequential integers? Aren't these lines doing the opposite - i.e. using the index that's passed in so the index doesn't have to be sequential integers?
*_constraint.index
is a descriptive name per row. *_jd.cat_id
is sequential integers. So the code replaces names with integers. The bug is that names work, but integers don't. I will try removing those lines and see if it works this afternoon.
Definitely seems like it's worth a try to remove and see if it still works for you.
I have started work on the bug fix branch https://github.com/SEMCOG/synthpop/tree/bug_fix
So It is not so simple. Removing that h_constraint.index = h_jd.cat_id
and p_constraint.index = p_jd.cat_id
did not cause the the tests to pass, the sequential integers come from https://github.com/SEMCOG/synthpop/blob/semcog/synthpop/synthesizer.py#L85 i.e. https://github.com/SEMCOG/synthpop/blob/semcog/synthpop/categorizer.py#L88
@janowicz How is it going? Have you made progres?
I have not gotten to this yet. Wasn't sure if you were already taking it on with the bug_fix branch above. I have time tomorrow to do some work on this.
I added an assert to demonstrate the bug. But was unable to get the test to pass with the assert in place. and that is as far as I got.
Cool, thanks! I'll work on this today
Closing this issue for now thanks to #34. We can re-open as needed.
Some constraints given to ipu are ignored. I just noted that the number of columns in _FrequencyAndConstraints.ncols is not equal to
len(person_freq.columns) + len(household_freq.columns)
This means that some of the constraints are not being used.Now trying the 5 whys method:
Why 1?
Because the keys of OrderedDict must be unique, and the column names of
person_freq
andhousehold_freq
are sequential numbers. So the person constraints over write the household constraints.Why 2?
Why are they sequential numbers, they are not in the tests? In the unit test the columns have unique meaningful names. Because we set them to sequential numbers. That is how it has been since @fscottfoti added it to git.
And that is as far as I go.
How do we fix?
Do we stop using a Dict as a backing for _FrequencyAndConstraints, what consequences does that have? Do we stop changing the index we send, what consequences does that have?
We noted this while we are finalizing our synthesized population so we could use a prompt fix.