dllllb / pytorch-lifestream

A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision
Apache License 2.0
215 stars 46 forks source link

Support user profile features #117

Open Ayatafoy opened 1 year ago

Ayatafoy commented 1 year ago

Does the library supports using user profile features in addition to sequence of transactions? For example for each user a have hundreds of transactions and hundreds of profile features, that describes each user. And I want to create a foundational model that creates an embedding for each user, based on user transactions and user profile features. Also the use case I have is supervised user classification into n classes based on user transaction history and user profile features. I'm looking for native approach of utilizing transaction features and profile features is one model. Thanks!

ivkireev86 commented 1 year ago

The default pipelines doesn't support user profile features. There can be workarounds:

  1. Use features separatly. Train CoLES encoder model in unsupervised mode and transform transaction sequences to users embeddings. Next use the concatenation of user embeddings and user profile features for downstream model. The all features are frozen, downstream model can be any type (NN or boosting).
  2. It's possible end to end NN pipeline. Encoder for transactions, encoder for user profile, concatenation, classification head. It would be better to pretrain both encoder in supervised or unsupervised mode, then finetune it. PTLS can be easily adapted for this mode. Standard DataPreprocessing, join user profile, save parquet dataset. Standard dataset and dataloaders with collate_feature_dict. Customized LightningModule composed from ptls models.
  3. Pretrain joint encoder for transactions and user profile. Data pipeline the same as p.2. There can be option for pretrain task:
    • CoLES pretrain. Not the best option. User profile features can be exploit for user distinguish. This make CoLES task trivial and encoder learn nothing.
    • Use CoLES pretrain. Prepare hard batches using user profile features. Compose batches from user with similar profiles. CoLES will learn distinguish for them based on transactions.
    • Learn transactional encoder in opposite user features. Learn information than absent in user features. Using Mutual Information based losses can help.

We keep in mind the problem of joint embeddings for user profile and transactions. Some experiments was made, but without clear result.