Closed jupepis closed 1 year ago
As we discussed today in person, I think it is useful for the user to know a little bit about the output of omit_dyad. For me, the information between "the output which will describe the dynamic risk set via two objects" and "integrated modification that are observed in the event sequence" is especially useful. Maybe the last part about how the integration exactly occurs internally can be omitted, just the part that explains the output is already very helpful.
Further, I looked into the description of the function again. It really helped me to understand that there is one list per risk set modification. Moreover, I am missing the information about the use of the NA values to remove multiple entries at once. So maybe we can add this to the description in the help file:
[omit_dyad] list of lists. Each list refers to one risk set modification and must have two objects: a first object named 'time', that is a vector of two values defining the first and last time point of the time window where to apply the change to the risk set and a second object, named 'dyad', which is a "[data.frame]" where dyads to be removed are supplied in the format actor1,actor2,type (by row). The NA value can be used to remove multiple objects from the risk set at once with one risk set modification list (see Details).
And add this to the details: In omit_dyad, the NA value can be used to remove multiple objects from the risk set at once with one risk set modification list. For example, to remove all events with sender equal to actor “A” add a list with two objects ‘time’ = c(NA, NA) and ‘dyad’ = data.frame(actor1 = A, actor2 = NA, type = NA) to the omit_dyad list. For more details about the omit_dyad argument, see vignette(“reh”).
Is it true that NA can also be used for the time object or is it only for the dyad object, or does this object require time values?
Is it true that NA can also be used for the time object or is it only for the dyad object, or does this object require time values?
Yes, if NA is left on the starting time (first element in omit_dyad[[i]]$time) the function takes the time of the first event. If the ending time (second element in omit_dyad[[i]]$time) is NA, then the function takes the time of the last observed event.
This improvement will be part of remify 3.1.0
.
When one or more dyads are to be omitted from the set of dyads at risk at a specific time point or in a time window, this can be done by supplying the input
omit_dyad
toremify::reh()
according to the documentation?remify::reh()
. There is also an example available viavignette(topic = "reh", package = "remify")
.However, what and how is processed internally is not at the moment fully explained to the user, but the function works fine and provides the output needed from
remstimate
for estimation. We still want to explain to the user what happens internally so that they can use the processed omit_dyad. This can be overcome by creating a small vignette where we introduce an artificial case study in which we visually explain that different sets of actors or event types, or specific dyads, couldn't occur at different time points (time intervals defined by start and stop value). Such time windows are partially/totally/not overlapping and the internal routines ofremify::reh()
will minimize both the processing and the memory usage of the resulting omit_dyad object in the output which will describe the dynamic risk set via two objects:time
of length M (number of events in the sequence)risket
of dimensions [# number of risk set modifications occurred in the sequence x # of dyads]The vector
time
will indicate at each time point which row of the matrixriskset
to consider and apply the riskset modification. If the value of the vector is (-1), no dynamic risk set is applied for that time point and the full risk set is considered (all dyads).The matrix
riskset
is the actual object that minimizes the memory usage. It is the product of the processing of all the modifications defined in the inputomit_dyad
. Some modifications are defined on overlapping time intervals, other on separate time intervals, others on partially overlapping time windows. We integrate this information and reduce the number of rows of the matrix by considering the unique and integrated modification that are observed in the event sequence. Therefore, we make the vectortime
call which modification occurred via the row-index of the matrixriskset
. The integration of the several risk set modifications defined in the inputomit_dyad
is based on the time intervals where they occur and it can be explained by plotting the time intervals vertically, one on top of each other. Then, we can point out the intervals that overlap fully or partially or not at all. Afterward, we internally define brand new time intervals based on a careful intersection of such input intervals. For each new time window we integrate the riskset modification which might merge some of the modifications described in the input omit_dyad. For instance, if we supplyomit_dyad
as a list of two modification of the riskset where the time windows are defined asomit_dyad[[1]]$time <- c(243,560)
omit_dyad[[2]]$time <- c(300,400)
the output matrix
riskset
will have three rows describing the risk set changes at:c(243,299)
, where only dyads specified inomit_dyad[[1]]$dyad
are omitted from the risk setc(300,400)
, where dyads specified both inomit_dyad[[1]]$dyad
and inomit_dyad[[2]]$dyad
are omitted from the risk setc(401,560)
, where only dyads specified inomit_dyad[[2]]$dyad
are omitted from the risk set@mlmeijerink will give me feedback about making the experience with the input
omit_dyad
as well as with the vignette as much user-friendly as possible