dfm / tess-atlas

MIT License
9 stars 8 forks source link

executions for some cells (before sampling) take ~ min #149

Closed avivajpeyi closed 2 years ago

avivajpeyi commented 2 years ago

Although building the model is quick ~1s, running the test_model function takes ~ 1min.

Screen Shot 2021-11-19 at 3 17 26 pm

For ref:


def test_model(model):
    """Test a point in the model and assure no nans"""
    with model:
        test_prob = model.check_test_point()
        test_prob.name = "log P(test-point)"
        if test_prob.isnull().values.any():
            raise ValueError(f"The model(testval) has a nan:\n{test_prob}")
        test_pt = pd.Series(
            {
                k: str(round(np.array(v).flatten()[0], 2))
                for k, v in model.test_point.items()
            },
            name="Test Point",
        )
        return pd.concat([test_pt, test_prob], axis=1)

Similarly, optimizing the initial params takes ~3min:

Screen Shot 2021-11-19 at 3 23 19 pm

I understand that the optimization can take some time, but don't understand why the test_model takes ~1min...

dfm commented 2 years ago

This happens because Theano needs to compile the model when it is first used. These compiled models are (generally) cached and re-used, but since we're setting our "compiledir", each process will take some overhead to compile the first time. I'm not sure that there's really anything to do about this. I've tried some "clever" things in the past to share the cache between processes, but this always ends up being more trouble than it is worth!

avivajpeyi commented 2 years ago

Ah ok! Thanks Dan