Closed davidshumway closed 2 years ago
On datasets with only hundreds of time steps, there is not an issue with subsetting using for-loops (less than one minute).
However, on datasets with hundreds of thousands of time steps, using np.where
is a significant improvement.
The for-loop implementation takes over 0.1 seconds to perform one subset operation on a dataset with 100,000 time steps, meaning a total time of 10,000 seconds (~almost 3 hours) for the full dataset, while np.where
can run all 100,000 subset operations in a few seconds. This is tested on a dataset containing roughly 20 relationships per time step.
Using
np.where
significantly reduces the time to subset training data. This is in opposition to using inline for-loops to perform the subset operation.