Closed erzakiev closed 6 months ago
Hey @erzakiev I added this as a parameter in the development branch. It can be set from the command line in the prepare step
cnmf prepare --output-dir example_PBMC/cNMF --name pbmc_cNMF -c example_PBMC/counts.h5ad -k 5 6 7 8 9 10 --n-iter 20 --total-workers 1 --seed 14 --numgenes 2000 --beta-loss frobenius --max-nmf-iter 1000
or from the Python environment
cnmf_obj.prepare(counts_fn=countfn, components=np.arange(5,11), n_iter=20, seed=14,
num_highvar_genes=2000, max_NMF_iter=1000)
This will get pushed to the master branch and pypi hopefully in the next week.
However, I would warn that if it isn't converging in 1000 iterations usually something bad is happening (maybe K is way too high or the data is normalized weirdly).
Awesome thanks for the info and for the added feature!!
Dylan, I wonder if this is related to that, but I noticed that when the algorithm approaches the last, say, 10% of allocated tasks, the time for each task to finish is much longer than with the first several hundred tasks in the beginning when it just only starts the factorization. Is this by design? The latest tasks handle decomposition of the most nasty parts of the matrix or something?
# first several hunder of iterations are always quick
[Worker 3]. Starting task 699.
[Worker 4]. Starting task 628.
[Worker 10]. Starting task 694.
[Worker 5]. Starting task 677.
[Worker 9]. Starting task 717.
[Worker 7]. Starting task 655.
[Worker 6]. Starting task 750.
[Worker 1]. Starting task 733.
[Worker 0]. Starting task 816.
...
# at the very end ones are much more sluggish
Yes, I think that is because the later tasks are usually larger values of K which take longer...
I am going to add a feature to resubmit just jobs that failed. Hopefully that can help you get these last iterations finished.
Overall a strategy I'm finding useful is to do K selection with lower numbers of iterations (e.g. 10) and then once you've picked K, doing larger numbers of iterations for the selected values of K.
I hope this helps!
Hello Dylan, I was wondering how you would adjust the convergence limit for the underlying NMF implementation from the
sklearn
package, so that the following warning goes away:Is it possible to do from within command-line? Or it can only be done when using python interactively?
The
cnmf -h
gives lots of options, but none of them seem to be related: