wonsutak-ysz commented 2 weeks ago

LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation https://arxiv.org/abs/2302.08191

@misc{cai2023lightgclsimpleeffectivegraph,
      title={LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation}, 
      author={Xuheng Cai and Chao Huang and Lianghao Xia and Xubin Ren},
      year={2023},
      eprint={2302.08191},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2302.08191}, 
}

wonsutak-ysz commented 2 weeks ago

1.Introduction

In recent years, Graph Neural Networks (GNNs) have shown great results in graph-based recommendation systems. This is because GNNs are good at combining information from neighboring nodes for collaborative filtering. GNNs work by passing information through multiple layers on user-item interaction graphs to discover higher-order connection patterns. Most GNN-based collaborative filtering models rely on supervised learning, which needs lots of high-quality labeled data for training. However, in most recommendation scenarios, models face data limitations. This is why contrastive learning was introduced to recommendation systems. While contrastive learning has helped improve graph recommendation systems, its effectiveness heavily depends on data augmentation methods. Current data augmentation methods for contrastive learning either add random noise or use heuristic-based approaches. These methods have several problems:

1.Random graph data augmentation might lose important structural information and mislead the model 2.Heuristic-based contrastive learning methods rely too heavily on assumptions about data distribution, which hurts model performance and makes models sensitive to noise 3.Most graph contrastive learning recommendation systems suffer from over-smoothing, making it hard to distinguish between positive and negative examples

To solve these problems, this paper explores a new approach to graph contrastive learning and presents a simple but effective method called LightGCL. This method uses SVD and Reconstruction for data augmentation, which not only extracts useful information from user-item interaction graphs but also preserves global structural information.

wonsutak-ysz commented 2 weeks ago

2.Model

Local graph information model

Like traditional collaborative filtering, each user and item has a learnable feature vector to represent the node's characteristics. LightGCL uses a common two-layer GCN (Graph Convolutional Network) to learn about local graph neighborhood relationships.

LightGCL uses dropout to prevent overfitting, and adds residual connections to ensure that each module doesn't lose its own information during information propagation.

The final node features are the sum of features from each layer, and the prediction for a user(ui)-item(vi) pair is calculated as the dot product of their node features.

wonsutak-ysz commented 2 weeks ago

Efficient global relationship learning based on singular value decomposition

LightGCL efficiently extracts important global information using Singular Value Decomposition reconstruction. Specifically, it uses SVD to decompose the adjacency matrix $\mathcal{A}$ into $\mathcal{A} = USV^\top$, where $U/V$ are $I \times I$/$J \times J$ standard orthogonal matrices, and $S$ is a diagonal matrix with the same shape as $\mathcal{A}$, called the singular value matrix. LightGCL only selects the largest q singular values, meaning it applies truncated singular value decomposition to the adjacency matrix $\mathcal{A}$: $\mathcal{A} \approx U_qS_qV_q^\top$. This reconstructed adjacency matrix is essentially a low-rank approximation of the original adjacency matrix. The SVD-based graph structure learning offers two main advantages:

First, it identifies and strengthens the main components that are more important for user preferences Second, the new graph structure preserves global collaborative signals by considering each user-item pair

However, it's worth noting that computing complete singular values is computationally expensive for large-scale graphs, especially when applied to recommendation systems with massive amounts of data. Therefore, this paper uses an approximate singular value algorithm proposed by Halko et al. in 2011. The main idea is to first approximate the range of the input matrix with a low-rank orthogonal matrix, then perform SVD on this smaller matrix.

Where q is the rank of the decomposed matrix, which is artificially determined. isapproach. Then we can write out the process of eigenvector acquisition and extraction according to the reconstructed matrix

wonsutak-ysz commented 2 weeks ago

Simple and efficient contrast learning

Traditional graph contrastive learning methods (SGL, SimGCL, etc.) require creating two augmented graphs, and they don't use feature vectors generated from the original graph for contrastive learning. However, in LightGCL, the augmented graph is created using global collaborative relationships, which can enhance key features. Therefore, LightGCL directly uses the feature vectors $g{i,l}^{(u)}$ from the augmented graph and feature vectors $z{i,l}^{(u)}$ from the original graph to calculate the loss, thus simplifying the CL framework.

The loss function for items $\mathcal{L}s^{(v)}$ is calculated in the same way as shown above. To prevent overfitting, LightGCL randomly performs dropout in each batch. In the figure below, $y{i,ps}$ and $y{i,n_s}$ represent the predicted scores for positive and negative samples respectively.

wonsutak-ysz commented 2 weeks ago

3.Experiment

Performance evaluation

Comparisons with baseline methods across different datasets are shown below, where R and N represent the Recall and NDCG metrics used to measure the results. As shown in the table, contrastive learning methods (such as SGL, HCCF, SimGCL) show significant advantages compared to non-contrastive learning approaches. The LightGCL method proposed in this paper substantially outperforms existing contrastive learning methods across all datasets, validating the effectiveness of the framework constructed in this paper.

Computing efficiency evaluation

The figure below shows the time complexity of LightGCL and three baseline models during preprocessing and each training iteration. LightGCL requires SVD computation during preprocessing, which has a complexity of O(qE), but this is almost negligible. Moreover, in the LightGCL framework, the model only needs to compute one augmented graph, greatly simplifying the contrastive learning paradigm. Additionally, the SVD-reconstructed graph can be represented by low-rank matrices, which significantly improves the efficiency of matrix multiplication. The figure below shows the time complexity of LightGCL and three baseline models during preprocessing and each training iteration. By comparison, LightGCL is less than half the computational complexity of the best existing SimGCL.

wonsutak-ysz commented 2 weeks ago

Analysis against data sparsity and epidemic bias

Data sparsity and popularity bias are two common problems in recommendation systems. LightGCL experimentally validates the model's resistance to sparsity. As shown in the figure below, experiments were conducted on two datasets, Yelp and Gowalla, where the x-axis represents different user groups based on their number of interactions, and the y-axis represents these item groups' contribution to the overall recall value. This demonstrates LightGCL's ability to combat data sparsity. Regarding the resistance to popularity bias, as shown in the figure below, LightGCL demonstrates relatively good performance. However, in the extremely sparse group (<15) of the Gowalla dataset, LightGCL doesn't perform as well as SimGCL. The authors attribute this to the different distribution frequencies of various groups in the Yelp and Gowalla datasets.

Strike a balance between being too smooth and being too even

Graph contrastive learning aims to push learned feature vectors apart, but this might lead to the loss of cluster structures in the feature vector space. This paper examines the feature distributions of LightGCL and baseline models through measuring the mean distance between feature vectors and feature visualization. LightGCL's Mean Average Distance (MAD) falls between contrastive learning and non-contrastive learning approaches, demonstrating that LightGCL has certain advantages in combating over-smoothing and over-uniformity. As shown in the feature vector distribution plots above, non-contrastive learning methods (MHCN, LightGCN) show indistinguishable clusters in their feature vector distributions, indicating over-smoothing. Meanwhile, existing contrastive learning methods (SGL and SimGCL) face two problems:

1.Over-uniformity: For example, SGL on the Yelp dataset shows many uniformly distributed feature vectors without good global structure to capture collaborative relationships between users.

2.Highly dispersed small clusters: For instance, SimGCL's feature vectors on the Gowalla dataset form very small clusters, and there are overfitting issues between these clusters.

4.Conclusion

This paper presents a simple and efficient learning paradigm for graph contrastive learning in recommendation systems. The paper explores using Singular Value Decomposition reconstruction to enhance user-item interaction graph structures. Research findings show that our graph augmentation scheme demonstrates strong capabilities in resisting data sparsity and popularity bias. Extensive experiments show that the model achieves state-of-the-art results on multiple public evaluation datasets. Future work plans to explore incorporating causal analysis into the lightweight graph contrastive learning model to reduce confounding effects in data augmentation and further improve recommendation system performance.

AIML-K / GNN_Survey

LightGCN #23

1.Introduction

2.Model

Local graph information model

Efficient global relationship learning based on singular value decomposition

Simple and efficient contrast learning

3.Experiment

Performance evaluation

Computing efficiency evaluation

Analysis against data sparsity and epidemic bias

Strike a balance between being too smooth and being too even

4.Conclusion