MSingerLab / COMETSC

COMET Single-Cell Marker Detection tool
BSD 3-Clause "New" or "Revised" License
31 stars 7 forks source link

cell # limit? #16

Open ktyssowski opened 2 years ago

ktyssowski commented 2 years ago

Hello, I'm trying to run COMET on a large dataset (~500K cells), but I'm running into some errors that seem like they may be due to my dataset being too large? Is this the case or is there something else going on? Here's my stdout file:

Started on 2021-12-20T16:32:55.311333
Reading data...
Generating complement data...
########
# Processing cluster 1...
########
2 gene combinations
Running t test on singletons...
Calculating fold change
Running XL-mHG on singletons...
X = 75
L = 1000
Cluster size 500
XLMHG error
('List is too long. The maximum length supported is  65536.', 'occurred at index TNS1')
q-val error
local variable 'xlmhg' referenced before assignment
error in sliding values
local variable 'xlmhg' referenced before assignment
Creating discrete expression matrix...
discrete matrix construction failed
local variable 'cutoff_value' referenced before assignment

And here's my stderr file:

Traceback (most recent call last):
  File "/n/holylfs03/LABS/hoekstra_lab/Users/kelsey/comet/bin/Comet", line 8, in <module>
    sys.exit(main())
  File "/n/holylfs03/LABS/hoekstra_lab/Users/kelsey/comet/lib/python3.6/site-packages/Comet/__main__.py", line 866, in main
    process(cls,X,L,plot_pages,cls_ser,tsne,marker_exp,gene_file,csv_path,vis_path,pickle_path,cluster_number,K,Abbrev,cluster_overall,Trim,count_data,skipvis)
  File "/n/holylfs03/LABS/hoekstra_lab/Users/kelsey/comet/lib/python3.6/site-packages/Comet/__main__.py", line 361, in process
    discrete_exp_full = discrete_exp.copy()
UnboundLocalError: local variable 'discrete_exp' referenced before assignment
oshahid commented 2 years ago

Hi ktyssowski,

It does look like there are either too many cells or too many genes (I can't remember if this particular function uses cells or genes, I would have to look deeper into the source code) but the error is being thrown by the xlmhg package. Since COMET runs on a cluster-by-cluster basis, I'm not entirely sure if the issue is with the number of cells. How many genes are you testing? And is this run on a filtered dataset?