dllllb / pytorch-lifestream

A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision
Apache License 2.0
219 stars 47 forks source link

Force TimeZone during UNIX time conversion #148

Open ivan-chai opened 7 months ago

ivan-chai commented 7 months ago

The UNIX time conversion depends on a system time zone. For example, PySpark's F.unix_timestamp can produce different results for users in different time zones. One can achieve reproducible results by adding spark.conf.set("spark.sql.session.timeZone", "UTC") at the beginning of the data processing script. The same behavior might occur in Pandas data preprocessing, but I didn't check it.

ivan-chai commented 7 months ago

It is easy to make a patch for the current code, but it is hard (if possible) to achieve backward compatibility.