UDST / synthpop

Synthetic populations from census data
BSD 3-Clause "New" or "Revised" License
100 stars 47 forks source link

full scale test #19

Open fscottfoti opened 10 years ago

fscottfoti commented 10 years ago

Should probably synthesize the population of the Bay Area and solve any issues that come up. If it's fast enough we should go for the whole county (why not?).

fscottfoti commented 10 years ago

@jiffyclub I gave this a shot. There was at least one block group that needed 15K iterations in the ipu. When I upped it to 20K iterations the Bay Area completed successfully. Right now it's running in about 40 minutes. Seems like checking the results is the next order of business.

jiffyclub commented 10 years ago

Nice! A couple thoughts:

As we work on the validation we'll want to track everything we do so it can be publicized. I dunno if maybe a separate repo would be good for that, or if we keep it in this one somewhere.

fscottfoti commented 10 years ago

I wonder these things too - we can definitely try it and see.

I agree on publicized validation - I vote for keeping it in this repo - maybe with a notebook (or more than one) that's well annotated, I would guess in a separate directory.

waddell commented 10 years ago

Nice progress. 40 minutes includes the sampling of household and person records and writing the resulting synthetic population out? or just through the IPU step?

I also like the publicized validation approach, and keeping that on the same repo sounds good.

On Sun, Sep 7, 2014 at 4:27 PM, Fletcher Foti notifications@github.com wrote:

I wonder these things too - we can definitely try it and see.

I agree on publicized validation - I vote for keeping it in this repo - maybe with a notebook (or more than one) that's well annotated, I would guess in a separate directory.

— Reply to this email directly or view it on GitHub https://github.com/synthicity/synthpop/issues/19#issuecomment-54764837.

darebrawley commented 5 years ago

Hi -- I'm trying to use SynthPop as part of a research project and am encountering runtime issues. I'm applying the synthesizer for Mecklenburg County, NC and am getting the following runtime for a single block. Any suggestions?

I was super encouraged to see that @waddell was able to do the full bay area in 40 minutes.

Time to run ipu: 390.129s IPU weights: count 3.687000e+03 mean 1.933344e-01 std 4.484030e-01 min 3.711018e-11 25% 4.032434e-06 50% 7.556055e-05 75% 1.988441e-01 max 7.685979e+00 dtype: float64 Fit quality: 4.872272957062106 Number of iterations: 234 Drawing 620 households

The following was achieved by using:

from synthpop.recipes.starter2 import Starter
from synthpop.synthesizer import synthesize_all, enable_logging
import os
import pandas as pd
enable_logging()

# setting API Key
os.environ["CENSUS"] = "d95e144b39e17f929287714b0b8ba9768cecdc9f"
starter = Starter(os.environ["CENSUS"], "NC", "Mecklenburg County")
ind = pd.Series(["37", "119", "005706", "4"], index=["state", "county", "tract", "block group"])
output = synthesize_all(starter, indexes=[ind])
output.to_csv("data/test_synth_output.csv")