Implementation of FFM module, (FAT-)DeepFFM model.

This is an implementation of the FAT-DeepFFM (Field Attentive Deep Field-aware Factorization Machine) model.

FAT-DeepFFM model is added together with a new file models/ranking/deepffm.py. The implementation follows the arXiv paper (https://arxiv.org/abs/1905.06336).
DeepFFM model is also added to file deepffm.py, as the FAT-DeepFFM model extends the DeepFFM (or NeuralFFM, https://cs.nju.edu.cn/31/60/c1654a209248/page.htm).
FFM module is added to the basic/layers.py file. The Field-aware Factorization Machine module, mentioned in the FFM paper (https://dl.acm.org/doi/abs/10.1145/2959100.2959134) explicitly models multi-channel second-order feature interactions, with each feature filed corresponding to one channel. The FAT-DeepFFM and DeepFFM models are implemented based on this module.
CEN module is added to basic/layers.py file. This is the Compose-Excitation Network module, mentioned in the FAT-DeepFFM paper as a modified version of Squeeze-and-Excitation Network (SENet) (Hu et al., 2017). It is used to highlight (as an attention mechanism) the importance of second-order feature crosses. The FAT-DeepFFM model is implemented based on this module.
Examples are added to examples/ranking/run_avazu.py and run_criteo.py. The hyper parameters (if not conflicting with the existing ones) are set according to Section 4.1 of the FAT-DeepFFM paper. Please refer to the paper for the complete setting.

Investigate different implementation of the CEN module in the FAT-DeepFFM model. Current implementation of the squeeze operation is based on the description in the original paper, namely 1x1 convolution. There are also other ideas such as average pooling and max pooling.
Additional signals extracted from the dense features as well as first order embeddings can also be added to the final sigmoid, see PaddleRec implementation and torecsys implementations.

datawhalechina / torch-rechub