When a lot of datasets are used for simulation (e.g. over 80), Scaden uses a lot of memory because every dataset is stored in memory.
This can be done better, by first iterating through the datasets to quickly get the common genes between them, and then subsampling every dataset separately.
When a lot of datasets are used for simulation (e.g. over 80), Scaden uses a lot of memory because every dataset is stored in memory.
This can be done better, by first iterating through the datasets to quickly get the common genes between them, and then subsampling every dataset separately.