gao-lab / GLUE

Graph-linked unified embedding for single-cell multi-omics data integration
MIT License
382 stars 56 forks source link

Some questions on mini-batching #79

Closed Chengwei94 closed 1 year ago

Chengwei94 commented 1 year ago

How is mini-batching handled in glue? I saw from the code that there are 2 type of minibatching(both for graph and modality). I was trying to look at the code but couldnt get the full logic.

  1. Wont mini-batching the graph cause bias in the output?
  2. How does the mini-batch work for each modality, since the modalities are of different size. Are the batch size of the 2 modalities the same, if so, how is it handled if the 2 modality's size are very different (1k cells vs 10k cells)?
Jeff1995 commented 1 year ago

Thanks for your interest in GLUE!

  1. Graph mini-batching is conducted by randomly selecting a batch of "positive" edges with probability proportional to edge weight, which is then combined with a random set of "negative" edges for contrastive learning (for each source node in the "positive" edges, we sample a number of "negative" target nodes with probability proportional to node degree). This sampling procedure is design to favor (1) edges with higher credibility, and (2) the global graph structure as manifested in high-degree nodes.
  2. The batch size of different modalities in each mini-batch is the same, so they receive the same attention. If it's 1k vs 10k cells, then an epoch is a full iteration over the 10k dataset, where the 1k dataset is cycled 10 times.

Hope that clarifies the issue.