How is mini-batching handled in glue? I saw from the code that there are 2 type of minibatching(both for graph and modality). I was trying to look at the code but couldnt get the full logic.
Wont mini-batching the graph cause bias in the output?
How does the mini-batch work for each modality, since the modalities are of different size. Are the batch size of the 2 modalities the same, if so, how is it handled if the 2 modality's size are very different (1k cells vs 10k cells)?
Graph mini-batching is conducted by randomly selecting a batch of "positive" edges with probability proportional to edge weight, which is then combined with a random set of "negative" edges for contrastive learning (for each source node in the "positive" edges, we sample a number of "negative" target nodes with probability proportional to node degree). This sampling procedure is design to favor (1) edges with higher credibility, and (2) the global graph structure as manifested in high-degree nodes.
The batch size of different modalities in each mini-batch is the same, so they receive the same attention. If it's 1k vs 10k cells, then an epoch is a full iteration over the 10k dataset, where the 1k dataset is cycled 10 times.
How is mini-batching handled in glue? I saw from the code that there are 2 type of minibatching(both for graph and modality). I was trying to look at the code but couldnt get the full logic.