dylkot / cNMF

Code and example data for running Consensus Non-negative Matrix Factorization on single-cell RNA-Seq data
MIT License
270 stars 57 forks source link

bulk RNAseq #39

Closed meettel closed 2 years ago

meettel commented 2 years ago

Hi! I was wandering whether this nice tool could also be applied to bulk RNAseq data, with a number of samples ranging from 100 to 1000. Thank you very much!

Sayyam-Shah commented 2 years ago

Hello @dylkot ,

I would like to know the answer to this as well. Every time, I input bulk data I get the below error. May you please help me troubleshoot this?

/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py:306: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass AnnData(X, dtype=X.dtype, ...) to get the future behavour. var=pd.DataFrame(index=input_counts.columns)) /cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py:331: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass AnnData(X, dtype=X.dtype, ...) to get the future behavour. var=pd.DataFrame(index=tpm.columns)) /cluster/home/t114108uhn/.local/lib/python3.7/site-packages/scanpy/preprocessing/_simple.py:843: UserWarning: Received a view of an AnnData. Making a copy. view_to_actual(adata) /cluster/home/t114108uhn/.local/lib/python3.7/site-packages/scanpy/preprocessing/_simple.py:843: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass AnnData(X, dtype=X.dtype, ...) to get the future behavour. view_to_actual(adata) Traceback (most recent call last): File "/cluster/home/t114108uhn/.local/bin/cnmf", line 8, in sys.exit(main()) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 917, in main num_highvar_genes=args.numgenes, genes_file=args.genes_file) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 353, in prepare high_variance_genes_filter=highvargenes) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 443, in get_norm_counts examples = norm_counts.obs.index[zerocells] File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4616, in getitem result = getitem(key) IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed Traceback (most recent call last): File "/cluster/home/t114108uhn/.local/bin/cnmf", line 8, in sys.exit(main()) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 920, in main cnmf_obj.factorize(worker_i=args.worker_index, total_workers=args.total_workers) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 567, in factorize run_params = load_df_from_npz(self.paths['nmf_replicate_parameters']) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 34, in load_df_from_npz with np.load(filename, allow_pickle=True) as f: File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 417, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: './cnmfint1/batch4_cnmf/cnmf_tmp/batch4_cnmf.nmf_params.df.npz' Traceback (most recent call last): File "/cluster/home/t114108uhn/.local/bin/cnmf", line 8, in sys.exit(main()) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 920, in main cnmf_obj.factorize(worker_i=args.worker_index, total_workers=args.total_workers) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 567, in factorize run_params = load_df_from_npz(self.paths['nmf_replicate_parameters']) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 34, in load_df_from_npz with np.load(filename, allow_pickle=True) as f: File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 417, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: './cnmfint1/batch4_cnmf/cnmf_tmp/batch4_cnmf.nmf_params.df.npz' Traceback (most recent call last): File "/cluster/home/t114108uhn/.local/bin/cnmf", line 8, in sys.exit(main()) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 920, in main cnmf_obj.factorize(worker_i=args.worker_index, total_workers=args.total_workers) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 567, in factorize run_params = load_df_from_npz(self.paths['nmf_replicate_parameters']) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 34, in load_df_from_npz with np.load(filename, allow_pickle=True) as f: File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 417, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: './cnmfint1/batch4_cnmf/cnmf_tmp/batch4_cnmf.nmf_params.df.npz' Traceback (most recent call last): File "/cluster/home/t114108uhn/.local/bin/cnmf", line 8, in sys.exit(main()) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 920, in main cnmf_obj.factorize(worker_i=args.worker_index, total_workers=args.total_workers) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 567, in factorize run_params = load_df_from_npz(self.paths['nmf_replicate_parameters']) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 34, in load_df_from_npz with np.load(filename, allow_pickle=True) as f: File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 417, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: './cnmfint1/batch4_cnmf/cnmf_tmp/batch4_cnmf.nmf_params.df.npz' Traceback (most recent call last): File "/cluster/home/t114108uhn/.local/bin/cnmf", line 8, in sys.exit(main()) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 920, in main cnmf_obj.factorize(worker_i=args.worker_index, total_workers=args.total_workers) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 567, in factorize run_params = load_df_from_npz(self.paths['nmf_replicate_parameters']) File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/cnmf/cnmf.py", line 34, in load_df_from_npz with np.load(filename, allow_pickle=True) as f: File "/cluster/home/t114108uhn/.local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 417, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: './cnmfint1/batch4_cnmf/cnmf_tmp/batch4_cnmf.nmf_params.df.npz'

My code is below.

cnmf prepare --output-dir ./cnmfint1 --name batch12_cnmf -c /cluster/projects/Sayyam/lab_sortedFractions_Batch1and2_FLBMmPB_49f90_counts1.csv --tpm /cluster/projects/Sayyam/lab_sortedFractions_Batch1and2_FLBMmPB_49f90_vst1.csv -k 5 6 7 8 9 10 11 12 13 14 15 16 17 --n-iter 100 --seed 14 --numgenes 3000

cnmf factorize --output-dir ./cnmfint1 --name batch12_cnmf --worker-index 0 --total-workers 10 cnmf factorize --output-dir ./cnmfint1 --name batch12_cnmf --worker-index 1 --total-workers 10 cnmf factorize --output-dir ./cnmfint1 --name batch12_cnmf --worker-index 2 --total-workers 10 cnmf factorize --output-dir ./cnmfint1 --name batch12_cnmf --worker-index 3 --total-workers 10 cnmf factorize --output-dir ./cnmfint1 --name batch12_cnmf --worker-index 4 --total-workers 10 cnmf factorize --output-dir ./cnmfint1 --name batch12_cnmf --worker-index 5 --total-workers 10 cnmf factorize --output-dir ./cnmfint1 --name batch12_cnmf --worker-index 6 --total-workers 10 cnmf factorize --output-dir ./cnmfint1 --name batch12_cnmf --worker-index 7 --total-workers 10 cnmf factorize --output-dir ./cnmfint1 --name batch12_cnmf --worker-index 8 --total-workers 10 cnmf factorize --output-dir ./cnmfint1 --name batch12_cnmf --worker-index 9 --total-workers 10

Best Regards, Sayyam

dylkot commented 2 years ago

Hi all,

Sorry for the slow response. I think in principle it could work with bulk RNA analogous to previous methods based on NMF in cancer genomics (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8135089/). For @Sayyam-Shah, it seems like you might have some tumors with 0 counts of the high variance genes. Maybe input your own list of high variance genes and make sure that all of the samples have a reasonable number of counts of them?