dllllb / pytorch-lifestream

A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision
Apache License 2.0
220 stars 48 forks source link

Speedup collect_lists and add test #141

Closed ivan-chai closed 9 months ago

ivan-chai commented 9 months ago

Speedup Spark data preparation:

  1. Reduce amount of sorts during collect lists
  2. Don't attach collected lists to each row.

Speedup on bowl2019: 5:13 -> 2:14 Speedup on AlphaBattle: 37:07 -> 23:45