Hoohm / CITE-seq-Count

A tool that allows to get UMI counts from a single cell protein assay
https://hoohm.github.io/CITE-seq-Count/
MIT License
79 stars 44 forks source link

ValueError: columns cannot be a set #172

Open jamesboot opened 2 years ago

jamesboot commented 2 years ago

Hello,

I'm getting the following error during UMI correction, when running CITE-seq-Count 1.4.5, Python 3.8.

Correcting umis
Traceback (most recent call last):
  File "/data/home/hmy961/citeseq-counts-env/bin/CITE-seq-Count", line 8, in <module>
    sys.exit(main())
  File "/data/home/hmy961/citeseq-counts-env/lib/python3.8/site-packages/cite_seq_count/__main__.py", line 603, in main
    io.write_dense(
  File "/data/home/hmy961/citeseq-counts-env/lib/python3.8/site-packages/cite_seq_count/io.py", line 48, in write_dense
    pandas_dense = pd.DataFrame(sparse_matrix.todense(), columns=columns, index=index)
  File "/data/home/hmy961/citeseq-counts-env/lib/python3.8/site-packages/pandas/core/frame.py", line 639, in __init__
    raise ValueError("columns cannot be a set")
ValueError: columns cannot be a set

Command and options for running:

CITE-seq-Count -R1 $READ1 -R2 $READ2 -t $TAGS -cbf 1 -cbl 16 -umif 17 -umil 26 -trim 10 -T 4 --max-error 3 -cells $CELLS --whitelist $WHITELIST -o $OUTDIR

Also, may or may not be related, I am getting the following warning at the start of the processing. I've always used the options above so not sure why this warning is appearing now.

[WARNING] Read1 length is 28bp but you are using 26bp for Cell and UMI barcodes combined.
This might lead to wrong cell attribution and skewed umi counts.

Any help would be much appreciated!

Hoohm commented 2 years ago

Hello @jamesboot A few things I would test out.

I think it might be because this version doesn't check for 0 counts and I think this is what is happening there.

  1. Are you sure about the UMI stopping at 16bp? Not super important, but it might give you a little increase in your UMI counts depending on your library's diversity and its size.
  2. Can you run the same without the whitelist? I suspect There is no cell barcode overlap. This usually happens on SCv3 from 10x runs.
jamesboot commented 2 years ago

Hi Patrick,

Thanks very much for your quick response. I tried running without the whitelist (all other options the same as above) but still got the same error message:

Correcting umis
Traceback (most recent call last):
  File "/data/home/hmy961/citeseq-counts-env/bin/CITE-seq-Count", line 8, in <module>
    sys.exit(main())
  File "/data/home/hmy961/citeseq-counts-env/lib/python3.8/site-packages/cite_seq_count/__main__.py", line 603, in main
    io.write_dense(
  File "/data/home/hmy961/citeseq-counts-env/lib/python3.8/site-packages/cite_seq_count/io.py", line 48, in write_dense
    pandas_dense = pd.DataFrame(sparse_matrix.todense(), columns=columns, index=index)
  File "/data/home/hmy961/citeseq-counts-env/lib/python3.8/site-packages/pandas/core/frame.py", line 639, in __init__
    raise ValueError("columns cannot be a set")
ValueError: columns cannot be a set

I'm pretty sure the UMI stops at 16bp. We are using 10X 5' v2 chemistry if that helps.

Hoohm commented 2 years ago

I think this is a bug with the newest pandas version: https://github.com/facebook/Ax/issues/1153

Can you try to reinstall with pandas 1.4?

ymjzhang commented 2 years ago

I think this is a bug with the newest pandas version: facebook/Ax#1153

Can you try to reinstall with pandas 1.4?

This fixed the error for me. I think it's because, in pandas 1.5, they no longer allow DataFrame columns to be set by a set datatype. io.py has a line that creates a DataFrame by setting the columns with a set datatype.

jamesboot commented 2 years ago

Sorry for taking some time to come back to this. Running with pandas 1.4 fixed the problem for me too! Thanks for your help!