UDST / synthpop

Synthetic populations from census data
BSD 3-Clause "New" or "Revised" License
99 stars 47 forks source link

Parallel py3 #43

Open mxndrwgrdnr opened 6 years ago

mxndrwgrdnr commented 6 years ago

Could use an extra set of eyes at L264-265 in synthesizer.py. These two lines account for the fact that when processing geographies in parallel, draw.draw_households() cannot rely on the hh_index_start argument to set the household_id either as the index of the synthetic households table or the hh_id field of the synthetic persons table. Thus we have to adjust the households index and persons hh_id field AFTER all the tables have been generated, which is what's hapenning at L264-265.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.2%) to 81.643% when pulling c18011629a67aa14ac525c4eea22fd6a74171b70 on parallel_py3 into d1de14a2d70a83d366294579f792e8b2ee1c9b9a on master.

janowicz commented 6 years ago

Thanks very much for this great contribution @mxndrwgrdnr! We'll review this soon, sorry for the delay. There is an existing earlier open PR that was reveiewed, which we kept open for a bit in the case anyone from the community wanted to review, that we'll merge in first. And then we can help reconcile with this PR.

mxndrwgrdnr commented 6 years ago

@cvanoli @janowicz this should be good to go now folks.

janowicz commented 6 years ago

Awesome, thanks @mxndrwgrdnr! I'll check it out today

cvanoli commented 6 years ago

thanks !! @mxndrwgrdnr

cvanoli commented 6 years ago

I tested the branch with a local environment in my computer and two Future Warnings keep coming up: 1 -synthpop\ipu\ipu.py: Line191

RuntimeWarning: divide by zero encountered in double_scalars adj = constraint / float((column * weights).sum())

2 - synthpop\census_helpers.py: Line 180:

FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'. pums = pd.concat([pums, pums00], ignore_index=True)

mxndrwgrdnr commented 6 years ago

@janowicz the second FutureWarning that @cvanoli mentions isn't a big deal and should resolve itself in the future. The first FutureWarning, however, is happening during IPU when household weights that are very nearly zero are not close enough to zero to be caught by ipu._drop_zeros but close enough to trigger a divide by zero RuntimeWarning. It's just a warning and not an error so I don't think it is effecting performance but I wanted to bring it to attention and see if you had an opinion.