federicomarini / quantiseqr

https://federicomarini.github.io/quantiseqr/
GNU General Public License v3.0
0 stars 2 forks source link

Reduce runtimes #24

Open komalsrathi opened 3 months ago

komalsrathi commented 3 months ago

Hi @federicomarini is there a possibility of filtering the input matrix and running it only on a subset of genes? Looks like our starting matrix has 60325 genes that I filter further removing genes with zero expression across all samples in the dataframe, resulting in 53614 genes. The function takes a long time in running because we have multiple matrices (split by histology type) with sample sizes from 1 to ~650 samples. Any suggestions would be much appreciated. Thank you!

federicomarini commented 3 months ago

Hi there @komalsrathi , I think the removal of not expressed at all genes is not detrimental to any step further. Whether you can extend this reasoning by saying "ok, let's be a little more aggressive in filtering", probably the opinion of @FFinotello can be a better guidance.

Otherwise, I think you can safely parallelize the whole, at least on the side of the individual matrices, provided the machine you are using is having enough RAM.

Francesca, are there any tricks one can use in these situations? Federico

komalsrathi commented 3 months ago

Otherwise, I think you can safely parallelize the whole, at least on the side of the individual matrices, provided the machine you are using is having enough RAM.

Thank you - will try this. Is this ticket related: https://github.com/federicomarini/quantiseqr/issues/11 where in you could limit analysis to fewer genes? Either way, if you would like to close this ticket, please go ahead.

federicomarini commented 3 months ago

I'd keep it open till we have a completely sure answer, but just from my understanding it does fully make sense to exclude genes that are otherwise very low on expression level.