monitors --> nwstats.formula

smjenness commented 3 years ago

control.net argument for tergmLite should match the same argument name for tergm

chad-klumb commented 3 years ago

The argument names can be changed, but there is a difference in behavior.

Setting extract.summary.stats = TRUE gives generative model statistics, and the option to obtain additional statistics via monitors.

This does not appear to match the behavior of save.nwstats and nwstats.formula.

We can abandon separate handling of generative model statistics in tergmLite; this will simplify things but cost efficiency when you want those statistics.

Or we can have save.nwstats behave as extract.summary.stats and nwstats.formula default to ~. when tergmLite = TRUE in control.net. This difference is not that complicated, but will need to be understood by the user.

Preference?

Currently, we also save tergmLite stats to summstats rather than nwstats, but it should be possible to change that as well.

smjenness commented 3 years ago

Thanks for the clarification. We should have the same approach for both tergm and tergmLite, with the use of save.nwstats to specify whether we want to calculate the stats, and nwstats.formula to specify the ergm formula used for the stats we want.

The default, for both netdx and netsim, is "formation", which means the generative model statistics as I think you are using the term (right?). That should still be the case. Whether using tergm or tergmLite, when a user specifies save.nwstats = TRUE it should default to the ergm formula used in netest. One can specify an alternative formula that may include some or all of the netest model formula.

I would like everything saved to dat$stats$nwstats so that we can use get_nwstats and plot.netsim in the same way for tergm and tergmLite. I just implemented that change.

One wrinkle is that our current use of nwstats.formula is the direct formula object, whereas you have monitors as a nested list of formulas. the latter makes more sense going forward for nwstats.formula as we move towards supporting multi-layer networks in core EpiModel.

chad-klumb commented 3 years ago

Neither "formation" nor the formula used in netest are, in general, the same as the generative model formula used in tergmLite simulation.

That being the case, I will eliminate separate handling of the generative model statistics. This will make obtaining those statistics slower for tergmLite, but it sounds like we are willing to pay that price for greater consistency with the tergm case.

The list() for monitors (and, earlier, MCMC_control) was for the general EpiModelHIV case of multiple networks; it shouldn't be necessary for current core EpiModel, and I would just as soon use a single formula given the other changes that are being made now.

smjenness commented 3 years ago

I do not know what you mean by "generative model statistics" then. Could you explain, perhaps with a specific example?

Sounds fine for using a single formula versus a list for nwstats.formula. Eventually, we hope to support multi-layer networks in core EpiModel, but that will be for later...

smjenness commented 3 years ago

By the way, here is a good test case for the summary stats with tergmLite: http://statnet.org/nme/d4-s5-Serosort.html. Currently this uses full tergm to generate the epi sims because we want to track the nwstats after exogenous processes (epi dynamics) are imposed. It would be ideal to be able to just flip the tergmLite switch to TRUE in the control settings and get the same outputs as in the tutorial without any other changes to the syntax. Here is the R script..

chad-klumb commented 3 years ago

The generative model statistics for a tergm are those of the generative model formula, something like ~Form(~edges + nodefactor("age.grp") + nodefactor("race")) + Diss(~edges). The Form statistics are evaluated on the union (formation) network and the Diss statistics are evaluated on the intersection (dissolution) network. In general these are not the same as statistics evaluated on the instantaneous network (which is what you get with "formation" or the cross-sectional ergm formula used in netest).

The generative model statistics are the ones neatly available from the ergm_model we already have to prepare for simulation. If we want something else, it will mean another ergm_model call; if that's what we want then that's what we'll get.

smjenness commented 3 years ago

I see, so ~Form(~edges) would be the number of new edges as at a time step, and so on?

Yes, we typically want to diagnose the model with its cross-sectional network representation, so we would need the stats evaluated on the instantaneous network. That's what netdx does and what netsim does for a full tergm approach.

cc: @martinamorris in case you have any opinions about this...

chad-klumb commented 3 years ago

~Form(~edges) would be the number of edges in either the current network or the previous timestep's network

For long duration models, differences between instantaneous and union/intersection statistics tend to be small (like 1/duration)

martinamorris commented 3 years ago

@smjenness if this is for netdx, then yes, we want the cross-sectional stats for each network in the timeseries, not the Form() stats. @chad-klumb am i right that these could be obtained from the Cross() operator?

FYI, we're putting together the TERGM tutorial now, and using this as an opportunity to write up a guide to migration from stergm to tergm. For now just know that the big move is from the two formation and dissolution equations to a set of 4 "operators" that can represent the dynamics in different ways, and can be used alone or in combination: Form(), Diss(), Cross(), & Change(). The comparable pair in new tergm are Form() & Diss(); these produce a stergm if all terms are inside the Form and Diss operators and there are no dyad-dependent sample space constraints.. Once you understand the other two operators, the whole TERGM modeling class makes much more sense.

smjenness commented 3 years ago

no, this is for netsim not netdx. We want the tergmLite summary stats to match the full tergm summary stats (which have been implemented since the beginning of EpiModel)

martinamorris commented 3 years ago

Ok, sorry.

Looking at the script http://statnet.org/nme/d4-s5-Serosort.R what I see is you pulling out the cross-sectional stats you are monitoring from the simulation.

These are not the "generative" stats, or even the cross-sectional stats from the terms in the model, b/c you specified a different formula to monitor (nwstats.formula = ~edges + nodematch("status")) than you did for the model in the fit/sim steps (formation = ~edges + nodefactor("status"))

So we need to be more precise with terminology. "Summary stats" can mean different things in different contexts.

Is the goal to save the stats from this call: nwstats.formula = ~edges + nodematch("status"), using the same syntax for both tergm and tergmLite?

smjenness commented 3 years ago

Correct -- summary nwstats means cross-sectional network statistics that may be in the same form as the model, or may be different (usually model formula + some extra terms). We currently have that implemented for tergm, and want it implemented for tergmLite using the same syntax (please and thank you @chad-klumb!).

chad-klumb commented 3 years ago

I think this and #540 are done. They are tested in tergmLite/tests/testthat/test-dynamic-simulation.R. If you have additional tests, now would be a good time to try them.

chad-klumb commented 3 years ago

Just a note that I'll be adding support for nwstats.formula = NULL (converting it to ~., which produces no statistics) shortly (at the moment, NULL is an error).

chad-klumb commented 3 years ago

NULL should work now

missed a summary call in net.mod.init, which I'm fixing now

chad-klumb commented 3 years ago

actually, for consistency with the tergm case, I think that summary call shouldn't be there at all

will probably be removing it momentarily

chad-klumb commented 3 years ago

actually, it's possible it should be there, since the tergm method sets up stats in resim_nets as a special case when at == 2

will test a bit and compare results

smjenness commented 3 years ago

I can report that this example case now works perfectly for both tergm and tergmLite: http://statnet.org/nme/d4-s5-Serosort.html. The only difference is simply toggling tergmLite TRUE/FALSE. Thanks!

smjenness commented 3 years ago

I have tested this with a bunch of different scenarios and can't find anything wrong with the output. I will add some unit tests on this in EpiModel now, and then I think we are OK to merge into master. Anything else to address on your side @chad-klumb ?

chad-klumb commented 3 years ago

I converted storage to a data.frame from a matrix (since the tergm method uses a data.frame).

Other than that there's nothing I'm aware of, but I will continue to test things between now and the release and open an issue on anything worth discussing.

smjenness commented 3 years ago

It looks like there are a bunch of failing tests on tergmLite under test-updateModelTermInputs.R. Are you addressing those?

https://github.com/statnet/tergmLite/runs/2775727596?check_suite_focus=true

I will wait until tergmLite passes until I merge EpiModel.

chad-klumb commented 3 years ago

Those aren't failing on either of my windows laptops with the usual install stack

remotes::install_github(c("statnet/rle@master",
                          "statnet/statnet.common@master",
                          "statnet/network@master",
                          "statnet/networkDynamic@master",
                          "statnet/ergm@master",
                          "statnet/tergm@master",
                          "statnet/tergmLite@dev",
                          "statnet/EpiModel@dev"), dependencies = FALSE, force = TRUE)

I'll see if I can figure out what's going on...

chad-klumb commented 3 years ago

Debugged it on the Rstudio servers. Should be fixed now.

smjenness commented 3 years ago

great! I am starting the merge now...

EpiModel / EpiModel

monitors --> nwstats.formula #541