aertslab / pycisTopic

pycisTopic is a Python module to simultaneously identify cell states and cis-regulatory topics from single cell epigenomics data.
Other
58 stars 12 forks source link

Bug report [BUG] cistopic_obj has nan values for sample and leiden column after adding cell_data #168

Closed yojetsharma closed 2 weeks ago

yojetsharma commented 1 month ago

Describe the bug I used cell_data from my snRNA seq data (generated independently using the cellranger and not cellranger-arc). This was added to the cis_object that was generated using the atac seq data (generated using the cellranger-arc). The cistopic_obj seemed to have correct mapping of the barcodes but the sample and leiden columns only have NaNs in them:

                                             barcode  sample  leiden  \
GCCTTACTCTAGCTAA-1-d149___d149    GCCTTACTCTAGCTAA-1     NaN     NaN   
GGACTAAAGCCGCTTT-1-d149___d149    GGACTAAAGCCGCTTT-1     NaN     NaN   
GCCCGTTGTAATGACT-1-d149___d149    GCCCGTTGTAATGACT-1     NaN     NaN   
GTGGATGCATAAACCT-1-d149___d149    GTGGATGCATAAACCT-1     NaN     NaN   
GGTCAGGAGGAACACA-1-d149___d149    GGTCAGGAGGAACACA-1     NaN     NaN   
...                                              ...     ...     ...   
TAGTGAGAGGGATTAG-1-ls003___ls003  TAGTGAGAGGGATTAG-1     NaN     NaN   
GGAGCGATCAATTGGC-1-ls003___ls003  GGAGCGATCAATTGGC-1     NaN     NaN   
GATAAAGGTGGGAACA-1-ls003___ls003  GATAAAGGTGGGAACA-1     NaN     NaN   
GGAAGTATCGTGCTTC-1-ls003___ls003  GGAAGTATCGTGCTTC-1     NaN     NaN   
TGAAACTGTGTTTGCT-1-ls003___ls003  TGAAACTGTGTTTGCT-1     NaN     NaN   

Version (please complete the following information):

Additional context Add any other context about the problem here.

SeppeDeWinter commented 1 month ago

Hi @yojetsharma

Can you share how the cell_data of your scRNA-seq looks like?

All the best,

Seppe

yojetsharma commented 1 month ago

I worked out the issue for this by changing the barcode style to BARCODE-1:SAMPLE_ID and then using ":" as the split-pattern.

coderstark18 commented 1 week ago

I worked out the issue for this by changing the barcode style to BARCODE-1:SAMPLE_ID and then using ":" as the split-pattern.

Hey, so do you mean you changed your cell_data just before creating the cistopicobject or do you mean you restarted the analysis from calling peaks?

yojetsharma commented 1 week ago

I worked out the issue for this by changing the barcode style to BARCODE-1:SAMPLE_ID and then using ":" as the split-pattern.

Hey, so do you mean you changed your cell_data just before creating the cistopicobject or do you mean you restarted the analysis from calling peaks?

There is an option split-pattern to which I was adding '-', because of which the code would read only the barcode prior to -1 and would result in mismatch. After replacing the '-' with ':' solved this for me.

coderstark18 commented 1 week ago

I worked out the issue for this by changing the barcode style to BARCODE-1:SAMPLE_ID and then using ":" as the split-pattern.

Hey, so do you mean you changed your cell_data just before creating the cistopicobject or do you mean you restarted the analysis from calling peaks?

There is an option split-pattern to which I was adding '-', because of which the code would read only the barcode prior to -1 and would result in mismatch. After replacing the '-' with ':' solved this for me.

Got it, thank you so much!