CostaLab / scopen

scOpen: single-cell open chromatin analysis via NMF modelling
GNU General Public License v3.0
25 stars 4 forks source link

IndexError: list index out of range #17

Closed Telogen closed 2 years ago

Telogen commented 2 years ago

Hi developer, I'm trying to use scOpen in commond line:

scopen --input_format sparse --input /local/txm/txmdata/test/aSM.txt --output_prefix scopen_out.txt --nc 20 --n_components 50 

and my input file is a sparse matrix generated by Matrix::writeMM() function in R:

Matrix::writeMM(pbmc.ATAC@assays$ATAC@counts,file='/local/txm/txmdata/test/aSM.txt')

here are several head lines of file aSM.txt:

%%MatrixMarket matrix coordinate integer general                                                                    
108377 10412 81271603                                                                                               
13 1 1                                                                                                              
17 1 2                                                                                                              
26 1 2                                                                                                              
28 1 2                                                                                                              
29 1 2                                                                                                              
31 1 2                                                                                                              
35 1 4                                                                                                              
38 1 6       

but I got the following error:

Namespace(alpha=1.0, binary=False, binary_quantile=0.5, estimate_rank=False, init='nndsvd', input='/local/txm/txmdat
a/test/aSM.txt', input_format='sparse', max_iter=500, max_n_components=30, min_n_components=2, n_components=50, nc=2
0, no_impute=False, output_dir='/mdshare/node8/txmdata/test', output_format='dense', output_prefix='scopen_out.txt',
 random_state=42, step_n_components=1, verbose=0)                                                                   
03/10/2022 16:18:34, detected 40 cpus, 20 of them are used.                                                         
03/10/2022 16:18:34, loading data...                                                                                
Traceback (most recent call last):                                                                                  
  File "/local/txm/anaconda3/envs/singlecell/bin/scopen", line 8, in <module>                                       
    sys.exit(main())                                                                                                
  File "/local/txm/anaconda3/envs/singlecell/lib/python3.8/site-packages/scopen/Main.py", line 173, in main         
    data, barcodes, peaks = load_data(args=args)                                                                    
  File "/local/txm/anaconda3/envs/singlecell/lib/python3.8/site-packages/scopen/Utils.py", line 19, in load_data    
    data, barcodes, peaks = get_data_from_sparse_file(filename=args.input)                                          
  File "/local/txm/anaconda3/envs/singlecell/lib/python3.8/site-packages/scopen/Utils.py", line 40, in get_data_from
_sparse_file                                                                                                        
    barcodes.append(ll[1])                                                                                          
IndexError: list index out of range 

Looking forward to your help!

lzj1769 commented 2 years ago

Hi @Telogen,

Thanks for your feedback.

To be able to use --input_format sparse, you need to remove the first two rows: %%MatrixMarket matrix coordinate integer general
108377 10412 81271603

I will implement a function to support this sparse matrix format later.

Meanwhile, to keep compatibility with other tools (e.g., Signac or episcanpy), you can save the matrix from R as a standard 10x folder:

writeMM(counts, file = "./filtered_peak_bc_matrix/matrix.mtx")
barcodes <- as.data.frame(colnames(counts))
peaks <- as.data.frame(stringr::str_split_fixed(rownames(counts), "_", 3))
write.table(barcodes, file = "./filtered_peak_bc_matrix/barcodes.tsv", 
            sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)
write.table(peaks, file = "./filtered_peak_bc_matrix/peaks.bed", 
            sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)

best, Li