dylkot / cNMF

Code and example data for running Consensus Non-negative Matrix Factorization on single-cell RNA-Seq data
MIT License
244 stars 57 forks source link

ValueError: shapes (7,78103) and (56262,999) not aligned: 78103 (dim 1) != 56262 (dim 0) #31

Open johnjeang opened 2 years ago

johnjeang commented 2 years ago

Hello,

I was following one of the tutorials for running cNMF on my own data set. My dataset has 78103 observations and on NAs or inf counts. To see if I could get the pipeline going I set the parameters to pretty small values to start.

I successfully managed to prepare the cnmf object with

cnmf_obj.prepare(counts_fn = 'Data/scvi_counts_adata.h5ad', components = np.arange(7,9), n_iter = 5, seed = 14, num_highvar_genes = 1000 and factorize with cnmf_obj.factorize(worker_i = 0, total_workers = 1)

However when running cnmf_obj.consensus(k=7, density_threshold=2.00, show_clustering=True, close_clustergram_fig=False)

I got the following error

ValueError                                Traceback (most recent call last)
/var/folders/06/kt3wlz1x285bzg98r_byq_480000gn/T/ipykernel_4464/2526977280.py in <module>
----> 1 cnmf_obj.consensus(k=selected_K, density_threshold=density_threshold, show_clustering=True, close_clustergram_fig=False)

~/opt/anaconda3/envs/cnmf_env/lib/python3.7/site-packages/cnmf/cnmf.py in consensus(self, k, density_threshold, local_neighborhood_size, show_clustering, skip_density_and_return_after_stats, close_clustergram_fig)
    705             norm_tpm = norm_tpm.astype(np.float64)
    706 
--> 707         usage_coef = fast_ols_all_cols(rf_usages.values, norm_tpm)
    708         usage_coef = pd.DataFrame(usage_coef, index=rf_usages.columns, columns=tpm.var.index)
    709 

~/opt/anaconda3/envs/cnmf_env/lib/python3.7/site-packages/cnmf/cnmf.py in fast_ols_all_cols(X, Y)
     51 def fast_ols_all_cols(X, Y):
     52     pinv = np.linalg.pinv(X)
---> 53     beta = np.dot(pinv, Y)
     54     return(beta)
     55 

<__array_function__ internals> in dot(*args, **kwargs)

ValueError: shapes (7,78103) and (56262,999) not aligned: 78103 (dim 1) != 56262 (dim 0)

I'm not sure where this 56262 number is coming from... Any ideas here? I tried to trace back to this norm_tpm object but I got lost and wanted to see if anyone had any thoughts here.

Thanks!

johnjeang commented 2 years ago

Ok, I did manage to find the error here. In the

compute_tpm() function some of the data was getting filtered out because the count was below 1 in the sc.pp.normalize_per_cell() function. This is normally good, but I wanted to work with some data processed from the SCVI batch correction with fractional counts that I think are relevant. It would be good if there was a feature in the consensus call that let me set the min count when the function is called.

I managed to fix this by going into the cnmf.py file and changing min_counts = 0 in the sc.pp.normalize_per_cell()function of the computer_tpm() function.

tianjiaozheng1997 commented 2 years ago

Hi, I have the same problem as you. My data is count data corrected with Soupx(used to correct for background in single-cell data), it contains decimals, so I run cnmf prepare --output-dir ./example_data --name example_cNMF -c ./example_data/counts_prefiltered.txt -k 5 6 7 8 9 10 11 12 13 --n-iter 100 --seed 14 --numgenes 2000 in this step1. It will report an error, so can you show which function you modified and help me modify it?

I got the following error:

/tools/miniconda/miniconda3/envs/cnmf_env/lib/python3.7/site-packages/cnmf/cnmf.py:306: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass AnnData(X, dtype=X.dtype, ...) to get the future behavour. var=pd.DataFrame(index=input_counts.columns)) /tools/miniconda/miniconda3/envs/cnmf_env/lib/python3.7/site-packages/scanpy/preprocessing/_simple.py:843: UserWarning: Revieved a view of an AnnData. Making a copy. view_to_actual(adata) Traceback (most recent call last): File "/tools/miniconda/miniconda3/envs/cnmf_env/bin/cnmf", line 8, in sys.exit(main()) File "/tools/miniconda/miniconda3/envs/cnmf_env/lib/python3.7/site-packages/cnmf/cnmf.py", line 917, in main num_highvar_genes=args.numgenes, genes_file=args.genes_file) File "/tools/miniconda/miniconda3/envs/cnmf_env/lib/python3.7/site-packages/cnmf/cnmf.py", line 353, in prepare high_variance_genes_filter=highvargenes) File "/tools/miniconda/miniconda3/envs/cnmf_env/lib/python3.7/site-packages/cnmf/cnmf.py", line 443, in get_norm_counts examples = norm_counts.obs.index[zerocells] File "/tools/miniconda/miniconda3/envs/cnmf_env/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4616, in getitem result = getitem(key) IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

@johnjeang

dylkot commented 2 years ago

Hi all, sorry for the bug.

@johnjeang - ahh that is very strange, I didn't realize scanpy.pp.normalize_per_cell had that behavior. I can consider changing the default to be min_counts=0. An alternative approach would be to pass your own normalized file via the tpm_fn argument to avoid that issue.

Out of curiosity, are you using SCVI as a batch correction tool prior to running cNMF and does SCVI not create negative values?

@tianjiaozheng1997 are you sure you're having the same result / does @johnjeang's change fix it?

tianjiaozheng1997 commented 2 years ago

Hi all, sorry for the bug.

@johnjeang - ahh that is very strange, I didn't realize scanpy.pp.normalize_per_cell had that behavior. I can consider changing the default to be min_counts=0. An alternative approach would be to pass your own normalized file via the tpm_fn argument to avoid that issue.

Out of curiosity, are you using SCVI as a batch correction tool prior to running cNMF and does SCVI not create negative values?

@tianjiaozheng1997 are you sure you're having the same result / does @johnjeang's change fix it?

No, it didn't work, my error result is still the same. I still can't find a solution to this error.

dylkot commented 2 years ago

@tianjiaozheng1997 - can you share the command that is causing the error? It looks like there is a formatting error with the input data such that after filtering to only high variance genes, there may be some cells with zero genes detected. Did you remove cells with very few counts?

tianjiaozheng1997 commented 2 years ago

@tianjiaozheng1997 - can you share the command that is causing the error? It looks like there is a formatting error with the input data such that after filtering to only high variance genes, there may be some cells with zero genes detected. Did you remove cells with very few counts?

hi,Thank you very much for your reply to my question.

  1. Below is the format of my input data, I did a step before input, i.e. remove those genes that are not expressed in all cells. `>dim(data) [1] 848 19968 #848 cells, 19968 genes

    head(data)[1:5,1:5] AL627309.3 TNFRSF18 AL645608.1 SAMD11 RER1 p11_AAACCTGGTGTGGCTC-1 0 0 0 0 0.000000 p11_AAACCTGTCGGTTCGG-1 0 0 0 0 7.854717 p11_AAACGGGAGTGGAGTC-1 0 0 0 0 4.750920 p11_AAACGGGGTGGCGAAT-1 0 0 0 0 1.613956 p11_AAAGATGCAAGTAATG-1 0 0 0 0 0.826196 `

  2. cnmf prepare --output-dir ~/test/NMF/robj/02.test_result --name p11_cNMF -c ~/test/NMF/robj/01.count_data/p11.test.count.txt -k 3 4 5 6 7 8 9 10 --n-iter 100 --seed 14 --numgenes 2000
  3. After running this step, I get this error ~/tools/miniconda/miniconda3/envs/cnmf_env/lib/python3.7/site-packages/cnmf/cnmf.py:306: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. PassAnnData(X, dtype=X.dtype, ...)to get the future behavour. var=pd.DataFrame(index=input_counts.columns)) ~/tools/miniconda/miniconda3/envs/cnmf_env/lib/python3.7/site-packages/scanpy/preprocessing/_simple.py:843: UserWarning: Revieved a view of an AnnData. Making a copy. view_to_actual(adata) Traceback (most recent call last): File "~tools/miniconda/miniconda3/envs/cnmf_env/bin/cnmf", line 8, in <module> sys.exit(main()) File "~/tools/miniconda/miniconda3/envs/cnmf_env/lib/python3.7/site-packages/cnmf/cnmf.py", line 917, in main num_highvar_genes=args.numgenes, genes_file=args.genes_file) File "~/tools/miniconda/miniconda3/envs/cnmf_env/lib/python3.7/site-packages/cnmf/cnmf.py", line 353, in prepare high_variance_genes_filter=highvargenes) File "~/tools/miniconda/miniconda3/envs/cnmf_env/lib/python3.7/site-packages/cnmf/cnmf.py", line 443, in get_norm_counts examples = norm_counts.obs.index[zerocells] File "~/tools/miniconda/miniconda3/envs/cnmf_env/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4616, in __getitem__ result = getitem(key) IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed I don't know how to solve this problem, so hope to get your help, thank you. @dylkot
dylkot commented 2 years ago

Hi @johnjeang - it seems you have some cells with zero counts of the highly variable genes. I believe that in the most current version of the code, the cell IDs for those cells should be printed rather than throwing an error. Could you try with the most recent version of cnmf from pip install cnmf and see if that is indeed the case? In any case you likely need to filter cells with too low total UMIs and re-run.

tianjiaozheng1997 commented 2 years ago

I updated to the latest version of cnmf, but it still doesn't work, I don't know if it's because my number of cells is too small.Only 108 cells and 18,653 genes. I tried reducing the number of hypervariable genes to 100, but it still didn't work.

Hi @johnjeang - it seems you have some cells with zero counts of the highly variable genes. I believe that in the most current version of the code, the cell IDs for those cells should be printed rather than throwing an error. Could you try with the most recent version of cnmf from pip install cnmf and see if that is indeed the case? In any case you likely need to filter cells with too low total UMIs and re-run.

dylkot commented 1 month ago

Hi, I am finally having some time to work on this again. If this problem is still active, please let me know. Thanks!