Memory footprint too big for simulation of large datasets

KevinMenden / scaden

Deep Learning based cell composition analysis with Scaden.

https://scaden.readthedocs.io

MIT License

71 stars 26 forks source link

Memory footprint too big for simulation of large datasets #68

Closed KevinMenden closed 3 years ago

KevinMenden commented 3 years ago

When a lot of datasets are used for simulation (e.g. over 80), Scaden uses a lot of memory because every dataset is stored in memory.

This can be done better, by first iterating through the datasets to quickly get the common genes between them, and then subsampling every dataset separately.