CostaLab / scopen

scOpen: single-cell open chromatin analysis via NMF modelling
GNU General Public License v3.0
25 stars 4 forks source link

fitting error #20

Closed yamajackr closed 1 year ago

yamajackr commented 1 year ago

Hi

Thank you for the great software. I'm trying to create the imputed matrix. I run scopen, but it failed due to fitting error when the n_components are 10 or 30. I also tried rank_estimation. But it returned fitting error for all the n_components. I appreciate it if you would give me any advice.

Namespace(alpha=1.0, binary=False, binary_quantile=0.5, estimate_rank=False, init='nndsvd', input='../../output/Objects/peaks_22Sep21_10x_like', input_format='10X', max_iter=500, max_n_components=30, min_n_components=2, n_components=10, nc=8, no_impute=False, output_dir='./scOpen', output_format='10X', output_prefix='signac_counts_22Sep21', random_state=42, step_n_components=1, verbose=0) 09/22/2022 06:43:06, detected 8 cpus, 8 of them are used. 09/22/2022 06:43:06, loading data... 09/22/2022 06:43:53, number of peaks: 118423; number of cells 12487 09/22/2022 06:43:53, number of non-zeros before imputation: 37617757 09/22/2022 06:43:53, sparsity: 0.9745610766847623 09/22/2022 06:43:53, running tf-idf transformation... 09/22/2022 06:43:55, running NMF... 09/22/2022 06:48:31, ranks: 10, fitting error: 109.01095069301478 Traceback (most recent call last): File "/usr/local/anaconda3/bin/scopen", line 8, in sys.exit(main()) File "/usr/local/anaconda3/lib/python3.8/site-packages/scopen/Main.py", line 206, in main df = pd.DataFrame(data=w_hat, index=peaks) File "/usr/local/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 672, in init mgr = ndarray_to_mgr( File "/usr/local/anaconda3/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 324, in ndarray_to_mgr _check_values_indices_shape_match(values, index, columns) File "/usr/local/anaconda3/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 393, in _check_values_indices_shape_match raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}") ValueError: Shape of passed values is (118423, 10), indices imply (118424, 10)

lzj1769 commented 1 year ago

Hi,

The error message says that the number of peaks does not match when scopen outputs the results.

May I ask what input format you used?

Best, Zhijian

yamajackr commented 1 year ago

Thank you for the quick reply, Zhijian @lzj1769. I used the 10X format (matrix.mtx, barcodes.txt, peaks.bed). I noticed the peaks.bed contained the column names (chromosome start end), then I removed them. I'm still woking on it, but it seems to be working properly. Also, I noticed the fitting error message itself is normal aswritten in your paper. I'm sorry to bother you.

Thank you, Ryosuke