cistrome / MIRA

Python package for analysis of multiomic single cell RNA-seq and ATAC-seq.
56 stars 8 forks source link

speed #27

Open bjstewart1 opened 1 year ago

bjstewart1 commented 1 year ago

I'm trying to train topic models on gene expression and ATAC data. Even with GPU, I'm finding this very slow particularly for the ATAC data. - for this i've slimmed the data down to ~10K peaks from 50K cells, but ideally would like to use closer to 100K peaks. The tutorial suggests caching data to disk model.write_ondisk_dataset(train, dirname = './....' is taking several hours, equally model.get_learning_rate_bounds is taking ~5 hours. that's before we even get to .fit()

the output of import torch torch.cuda.is_available() is True

Is this expected behaviour? Do you have suggestions for speedup?