'start.time' and 'end.time' argument in reh()

jomulder commented 1 year ago

Hi, maybe it would be useful to have a 'start.time' argument for reh() which sets the start time of the observational period. Currently 0 is the default, but this may result in an unrealistic gap for the first event, which may bias the intercept for instance. Relatedly, also a 'end.time' could be added. This will also allow researchers that only the events of a specific period will be remified, which could be useful in the case of enormous long edge lists while the interest is only in a subset.

jupepis commented 1 year ago

The start.time proposed refers to the origin argument already present in remify::reh().

The origin of the sequence refers to the starting time of the observational period in a relational event sequence and it is useful to calculate the waiting time to the first event in the sequence, as t[1]-t[0].

In the current version of remify,

if origin = NULL (not provided by the user) then, the origin (t[0]) is set to one time unit earlier then the first time point. One time unit depends on the scale of the time variable and can be 1month/1week/1day/1hour/1min/1sec. However, if the time of occurrence of the first event is 0<=t[1]<=1, then the orgin t[0] is set to 0. The problem here is that, If t[1] = 0.0, then t[0] will be 0 as well and we would need more information from the user on which waiting time to use for the first event observed at t[1].
if origin (t[0]) is provided by the user, the function first checks that the time origin is not greater or equal than t[1] (time of occurrence of the first event). If t[0] >= t[1], then the function recalculate t[0] (throwing a warning after the function finishes the processing) and assumes t[0] = t[1] - 1 (t[0] is set one time unit earlier than t[1]). Also here, if the first event is observed within the time unit, that is 0<=t[1]<=1, this means that t[0] = t[1] - 1 < 0 and, in such cases, t[0] is set to 0.

So t[0] is not the default unless specific conditions between input origin and input edgelist are met.

A possible solution to avoid the bias given by the (artificial/unrealistic) waiting time to the first event would be to remove the first event from the likelihood (but not from the computation of the statistics)

The end.time proposed refers to a potential new argument in remify::reh() that operates a selection of rows on the input edgelist before running the processing on the sub-sequence of events.

This can be already done by remify by supplying the sub-sequence of events as input edgelist. I think that it is better not to make remify load large data on memory. The most efficient way is to make the user load the sub-sequence of interest in the environment and make remify::reh() process the subset of data.

jupepis commented 1 year ago

I think that it is better not to make remify load large data on memory. The most efficient way is to make the user load the sub-sequence of interest in the environment and make remify::reh() process the subset of data.

quoting the sentence above, I close this issue.

TilburgNetworkGroup / remify

'start.time' and 'end.time' argument in reh() #15