Some questions on mini-batching

gao-lab / GLUE

Graph-linked unified embedding for single-cell multi-omics data integration

MIT License

382 stars 56 forks source link

Thanks for your interest in GLUE!

Graph mini-batching is conducted by randomly selecting a batch of "positive" edges with probability proportional to edge weight, which is then combined with a random set of "negative" edges for contrastive learning (for each source node in the "positive" edges, we sample a number of "negative" target nodes with probability proportional to node degree). This sampling procedure is design to favor (1) edges with higher credibility, and (2) the global graph structure as manifested in high-degree nodes.
The batch size of different modalities in each mini-batch is the same, so they receive the same attention. If it's 1k vs 10k cells, then an epoch is a full iteration over the 10k dataset, where the 1k dataset is cycled 10 times.

Hope that clarifies the issue.

gao-lab / GLUE