caokai1073 / uniPort

a unified single-cell data integration framework by optimal transport
MIT License
30 stars 3 forks source link

weird trend for running time when increasing cell number #8

Open AprilYuge opened 1 year ago

AprilYuge commented 1 year ago

Hi Kai,

I observed a weird trend for the running time when I applied uniPort to datasets that included 1k, 3k, 5k, 10k, 15k, 20k and 50k cells (both RNA and ATAC). The running time first decreased until the sample size reached 15k and then increased. The longest time was observed when there were only 1k cells. Do you have any explanations about this observation?

image

Best,

Yuge

caokai1073 commented 1 year ago

Hi, thank you for pointing this out. This is because the number of epochs in our training process is determined by the batch size and the number of cells.

Screenshot 2023-06-27 at 12 28 50

If you want to try more epochs, please increase the parameter: iteration (default 30000). The parameter means total mini-batches. Let me konw if you have other concerns. Thanks!

AprilYuge commented 1 year ago

Hi, thanks for your timely reply! Just want to confirm that I understand this correctly. If the batch size and the number of iterations are fixed, then the running time should not vary that much, because the total number of cells used for training (iterations * batch size) is constant?

caokai1073 commented 1 year ago

I'm not entirely sure what happened here, but it might be related to the datasets