aertslab / pycisTopic

pycisTopic is a Python module to simultaneously identify cell states and cis-regulatory topics from single cell epigenomics data.
Other
58 stars 12 forks source link

After add.cell_data to cisTopic_Object , it appears NaN #187

Closed Melody-cell closed 4 days ago

Melody-cell commented 1 week ago

Hello, when i run this : cistopic_obj.add_cell_data(cell_data, split_pattern='__')

The cell_data is like this: image

There are so many NaNs Did anyone know how to solve it?

SeppeDeWinter commented 1 week ago

HI @Melody-cell

Can you show?


cell_data.head()
cistopic_obj.cell_names[0:20]

All the best,

Seppe

Melody-cell commented 1 week ago

HI @Melody-cell

Can you show?

cell_data.head()
cistopic_obj.cell_names[0:20]

All the best,

Seppe

Hi, it is like this: image image

SeppeDeWinter commented 1 week ago

@Melody-cell

Can you also show


len(cistopic_obj.cell_names)
len(set(cistopic_obj.cell_names))

len(cell_data.index)
len(set(cell_data.index))

len(
   set([f"{bc}-{sample}___{sample}" for bc, sample in zip(cell_data.index, cell_data["sample"])])
   & set(cistopic_obj.cell_names))
Melody-cell commented 1 week ago

@SeppeDeWinter It's like this: image

SeppeDeWinter commented 1 week ago

Hi @Melody-cell

That looks allright. The reason for your issue is that pycisTopic assumes the following layout for barcodes [ACGT]*-[0-9]+-, which is not the case for you.

(for example standard 10x barcodes fit this pattern: ACTGTAGCTAG-1).

You can either reformat your barcodes to fit this pattern. Or you can manually add the annotation like this (this is only valid to do if you don't have duplicate barcodes, as is the case for you):


import pandas as pd

cell_data["cell_names_formatted"] = [
   f"{bc}-{sample}___{sample}" for bc, sample in zip(cell_data.index, cell_data["sample"])
]

cistopic_obj.cell_data = pd.merge(
   left = cistopic_obj.cell_data ,
   right = cell_data,
   left_index = True, # index of cistopic_obj.cell_data  are the cell names
   right_on = "cell_names_formatted",  # this should correspond to istopic_obj.cell_data.index,
   how = "left" # only add annotations for cells in cistopic_obj.cell_data, cells that are in cistopic_obj.cell_data but not in cell_data will get NaN as annotation
)

I hope this helps?

All the best,

Seppe

Melody-cell commented 6 days ago

Hi, @SeppeDeWinter I followed your step, it looks better: image is this normal?

SeppeDeWinter commented 4 days ago

Hi @Melody-cell

This looks OK. The _X and _Y suffixes appear because you have overlapping column names in both dataframes.

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html

All the best,

Seppe

Melody-cell commented 4 days ago

@SeppeDeWinter , OK, thank you for your patient reply.