TilburgNetworkGroup / remify

Processing and Transforming Relational Event History Data
https://tilburgnetworkgroup.github.io/remify/
Other
4 stars 1 forks source link

'start.time' and 'end.time' argument in reh() #15

Closed jomulder closed 1 year ago

jomulder commented 1 year ago

Hi, maybe it would be useful to have a 'start.time' argument for reh() which sets the start time of the observational period. Currently 0 is the default, but this may result in an unrealistic gap for the first event, which may bias the intercept for instance. Relatedly, also a 'end.time' could be added. This will also allow researchers that only the events of a specific period will be remified, which could be useful in the case of enormous long edge lists while the interest is only in a subset.

jupepis commented 1 year ago

The start.time proposed refers to the origin argument already present in remify::reh().

The origin of the sequence refers to the starting time of the observational period in a relational event sequence and it is useful to calculate the waiting time to the first event in the sequence, as t[1]-t[0].

In the current version of remify,

So t[0] is not the default unless specific conditions between input origin and input edgelist are met.

A possible solution to avoid the bias given by the (artificial/unrealistic) waiting time to the first event would be to remove the first event from the likelihood (but not from the computation of the statistics)


The end.time proposed refers to a potential new argument in remify::reh() that operates a selection of rows on the input edgelist before running the processing on the sub-sequence of events.

This can be already done by remify by supplying the sub-sequence of events as input edgelist. I think that it is better not to make remify load large data on memory. The most efficient way is to make the user load the sub-sequence of interest in the environment and make remify::reh() process the subset of data.

jupepis commented 1 year ago

I think that it is better not to make remify load large data on memory. The most efficient way is to make the user load the sub-sequence of interest in the environment and make remify::reh() process the subset of data.

quoting the sentence above, I close this issue.