Categorical convertor cannot preserve value-to-index mapping across runs

LukasZahradnik / deep-db-learning

A modular message-passing scheme reflecting the relational model for end-to-end deep learning from databases

https://lukaszahradnik.github.io/deep-db-learning/

6 stars 2 forks source link

Categorical convertor cannot preserve value-to-index mapping across runs #16

Open neumannjan opened 1 year ago

neumannjan commented 1 year ago

https://github.com/LukasZahradnik/deep-db-learning/blob/43e21b53b334c7fc3b4e0e699192d0a3ed3affac/db_transformer/ndata/convertor/cat_convertor.py#L24-L26

The above most definitely cannot survive across executions, e.g. from training to evaluation. The embedding vectors would get assigned to the actual values in an arbitrary order, given by the order in which the values are requested. However, we need this order to be ensured to be the exact the same for the same model every time, which this definitely doesn't do.

neumannjan commented 1 year ago

Or does torch.Module preserve this automatically? If yes, then this most definitely has to be part of the Module itself, not the dataset.

neumannjan commented 1 year ago

This is a reference to the same code, but in the latest commit as of right now:

https://github.com/LukasZahradnik/deep-db-learning/blob/75090ca2ac9a262d4362ebdf8ee0fd7b875f2896/db_transformer/ndata/convertor/columns/cat_convertor.py#L25-L27

LukasZahradnik commented 1 year ago

I'm not sure how to move this to torch.Module, but it should be somehow serialized there with the rest of the model.

If we do an evaluation right after training (in the same run), then we are fine for now.