Open DiegoSafian opened 6 months ago
Hi, this is actually the topic of our recent preprint https://t.co/OexYxSnc3D
The code we use for doing this is here:
https://github.com/immunogenomics/starCAT
The step to package the output of cNMF for starCAT is a little bit of a work in progress but it is the build_reference() function on the development branch which you can optionally enable automatically in the consensus step with build_ref=True
Let me know if this makes sense or if you have questions!
Hi, Thanks for your response. I am actually running cnmf using the command line (installed pip install cnmf), but I cannot find the way to enable build_ref=True. Do I need to work on a Python environment to do it?
This is how I run it:
conda activate cnmf
echo "### Step 1: prepare"
cnmf prepare --output-dir ./data --name 15_26_cNMF_5000 -c data_matrix.txt -k 15 16 17 18 19 20 20 22 24 26 --n-iter 250 --seed 14 --numgenes 5000 --total-workers 10
echo "### Step 2: factorize"
cnmf factorize --output-dir ./data --name 15_26_cNMF_5000 --worker-index 0
echo "### Step 3: combine"
cnmf combine --output-dir ./data --name 15_26_cNMF_5000
echo "### Step 4: plot"
cnmf k_selection_plot --output-dir ./data --name 15_26_cNMF_5000
echo "### Step 5: consensus"
cnmf consensus --output-dir ./data --name 15_26_cNMF_5000 --components 17 --local-density-threshold 0.025 --show-clustering
cnmf consensus --output-dir ./data --name 15_26_cNMF_5000 --components 18 --local-density-threshold 0.025 --show-clustering
cnmf consensus --output-dir ./data --name 15_26_cNMF_5000 --components 19 --local-density-threshold 0.025 --show-clustering
cnmf consensus --output-dir ./data --name 15_26_cNMF_5000 --components 20 --local-density-threshold 0.025 --show-clustering
cnmf consensus --output-dir ./data --name 15_26_cNMF_5000 --components 22 --local-density-threshold 0.025 --show-clustering
cnmf consensus --output-dir ./data --name 15_26_cNMF_5000 --components 24 --local-density-threshold 0.025 --show-clustering
Currently it is only on the development branch of the github (it will be moved to the main branch in the next few weeks hopefully). You can install it with pip like so:
pip install git+https://github.com/dylkot/cNMF.git@development
If you don't mind, let me know how it goes since this is something we are actively working on supporting.
Hi again, I tried it and it works perfectly fine and extremely fast! The results are good; however, the Usage % in the dataset B decreased quite a bit. On the other hand, I am probably asking too much because I am actually comparing single nuclei data in two different species, which can be more challenging due to differences in cell composition and gene expression capture. Still, it produces very coherent results. I would definitely keep using it. I am attaching a fig for you, so you can have an idea about the results example.pdf
Many thanks!
Hi,
I wonder if there is an appropriate way to estimate the usage of GEPs in another dataset so that one can compare changes in usage in different conditions? For example, I estimate GEPs usage per cell class in data set A and I want to know the usage of these GEPs in data set B.
My best, Diego