Dynamic Riskset - create a vignette

jupepis commented 1 year ago

When one or more dyads are to be omitted from the set of dyads at risk at a specific time point or in a time window, this can be done by supplying the input omit_dyad to remify::reh() according to the documentation ?remify::reh(). There is also an example available via vignette(topic = "reh", package = "remify").

However, what and how is processed internally is not at the moment fully explained to the user, but the function works fine and provides the output needed from remstimate for estimation. We still want to explain to the user what happens internally so that they can use the processed omit_dyad. This can be overcome by creating a small vignette where we introduce an artificial case study in which we visually explain that different sets of actors or event types, or specific dyads, couldn't occur at different time points (time intervals defined by start and stop value). Such time windows are partially/totally/not overlapping and the internal routines of remify::reh() will minimize both the processing and the memory usage of the resulting omit_dyad object in the output which will describe the dynamic risk set via two objects:

a vector time of length M (number of events in the sequence)
a matrix risket of dimensions [# number of risk set modifications occurred in the sequence x # of dyads]

The vector time will indicate at each time point which row of the matrix riskset to consider and apply the riskset modification. If the value of the vector is (-1), no dynamic risk set is applied for that time point and the full risk set is considered (all dyads).

The matrix riskset is the actual object that minimizes the memory usage. It is the product of the processing of all the modifications defined in the input omit_dyad. Some modifications are defined on overlapping time intervals, other on separate time intervals, others on partially overlapping time windows. We integrate this information and reduce the number of rows of the matrix by considering the unique and integrated modification that are observed in the event sequence. Therefore, we make the vector time call which modification occurred via the row-index of the matrix riskset. The integration of the several risk set modifications defined in the input omit_dyad is based on the time intervals where they occur and it can be explained by plotting the time intervals vertically, one on top of each other. Then, we can point out the intervals that overlap fully or partially or not at all. Afterward, we internally define brand new time intervals based on a careful intersection of such input intervals. For each new time window we integrate the riskset modification which might merge some of the modifications described in the input omit_dyad. For instance, if we supply omit_dyad as a list of two modification of the riskset where the time windows are defined as

omit_dyad[[1]]$time <- c(243,560) omit_dyad[[2]]$time <- c(300,400)

the output matrix risksetwill have three rows describing the risk set changes at:

the first row will describe the changes occurring at time points c(243,299), where only dyads specified in omit_dyad[[1]]$dyad are omitted from the risk set
the second row will describe the changes occurring at time points c(300,400), where dyads specified both in omit_dyad[[1]]$dyad and in omit_dyad[[2]]$dyad are omitted from the risk set
the third row will describe the changes occurring at time points c(401,560), where only dyads specified in omit_dyad[[2]]$dyad are omitted from the risk set

@mlmeijerink will give me feedback about making the experience with the input omit_dyad as well as with the vignette as much user-friendly as possible

mlmeijerink commented 1 year ago

As we discussed today in person, I think it is useful for the user to know a little bit about the output of omit_dyad. For me, the information between "the output which will describe the dynamic risk set via two objects" and "integrated modification that are observed in the event sequence" is especially useful. Maybe the last part about how the integration exactly occurs internally can be omitted, just the part that explains the output is already very helpful.

Further, I looked into the description of the function again. It really helped me to understand that there is one list per risk set modification. Moreover, I am missing the information about the use of the NA values to remove multiple entries at once. So maybe we can add this to the description in the help file:

[omit_dyad] list of lists. Each list refers to one risk set modification and must have two objects: a first object named 'time', that is a vector of two values defining the first and last time point of the time window where to apply the change to the risk set and a second object, named 'dyad', which is a "[data.frame]" where dyads to be removed are supplied in the format actor1,actor2,type (by row). The NA value can be used to remove multiple objects from the risk set at once with one risk set modification list (see Details).

And add this to the details: In omit_dyad, the NA value can be used to remove multiple objects from the risk set at once with one risk set modification list. For example, to remove all events with sender equal to actor “A” add a list with two objects ‘time’ = c(NA, NA) and ‘dyad’ = data.frame(actor1 = A, actor2 = NA, type = NA) to the omit_dyad list. For more details about the omit_dyad argument, see vignette(“reh”).

Is it true that NA can also be used for the time object or is it only for the dyad object, or does this object require time values?

jupepis commented 1 year ago

Is it true that NA can also be used for the time object or is it only for the dyad object, or does this object require time values?

Yes, if NA is left on the starting time (first element in omit_dyad[[i]]$time) the function takes the time of the first event. If the ending time (second element in omit_dyad[[i]]$time) is NA, then the function takes the time of the last observed event.

jupepis commented 1 year ago

This improvement will be part of remify 3.1.0.

TilburgNetworkGroup / remify

Dynamic Riskset - create a vignette #9