TilburgNetworkGroup / remify

Processing and Transforming Relational Event History Data
https://tilburgnetworkgroup.github.io/remify/
Other
4 stars 1 forks source link

column names of input edgelist of reh() function #12

Closed jomulder closed 1 year ago

jomulder commented 1 year ago

Hi, in my opinion it would also be fine if the column names of the edge list don't match with the required names "time", "actor1", "actor2" then running the reh() function, as long as the order of the columns is correct. So if the column names are different a warning could be given that column is assumed to be time, the second and third are the actors involved in the event. Best, Joris

jupepis commented 1 year ago

In issue #3, I wrote the reason why I would not re-code remify to work on the order of the columns rather than on their names.

In general, I do not like the idea that the user doesn't have a column-named edgelist in which she can make sure that the data have the right shape before being processed. I think that making the user define named columns as "actor1" (as sender column) and "actor2" (as receiver columns) is less error-prone. In a context of directed networks, relying only on the order of the columns can generate errors that are cannot be seen by the function (for instance when the columns on sender and receiver are given in a swapped order).

I am not closing this issue at the moment. We should discuss more ideas during the next meeting.

jupepis commented 1 year ago

In remify 3.1.0, the user can input the edgelist without requiring the specific name of the columns. However, the user is asked to provide the columns in a specific order. In the case of directed networks, the user must order the second and third columns of the edgelist in the correct order: second column is the sender, third column is the receiver. In undirected networks, is important that the pair of actors is defined between second and third column. Plus, in both directed and undirected networks, the "time" vector must be the first column of the edgelist.

Event type and weight columns, instead, must be named respectively "type" and "weight" because they are not compulsory fields (so either one of the two or both of them can be present in the input edgelist). Therefore, this makes the use of column order unreliable for such fields and remify::remify() needs to understand if there is a type-column or a weight-column via the column name.