INK-USC / RE-Net

Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs (EMNLP 2020)
http://inklab.usc.edu/renet/
436 stars 95 forks source link

Improves time to subset train data #51

Closed davidshumway closed 2 years ago

davidshumway commented 3 years ago

Using np.where significantly reduces the time to subset training data. This is in opposition to using inline for-loops to perform the subset operation.

davidshumway commented 3 years ago

On datasets with only hundreds of time steps, there is not an issue with subsetting using for-loops (less than one minute). However, on datasets with hundreds of thousands of time steps, using np.where is a significant improvement.

The for-loop implementation takes over 0.1 seconds to perform one subset operation on a dataset with 100,000 time steps, meaning a total time of 10,000 seconds (~almost 3 hours) for the full dataset, while np.where can run all 100,000 subset operations in a few seconds. This is tested on a dataset containing roughly 20 relationships per time step.