Closed Melody-cell closed 4 days ago
HI @Melody-cell
Can you show?
cell_data.head()
cistopic_obj.cell_names[0:20]
All the best,
Seppe
HI @Melody-cell
Can you show?
cell_data.head() cistopic_obj.cell_names[0:20]
All the best,
Seppe
Hi, it is like this:
@Melody-cell
Can you also show
len(cistopic_obj.cell_names)
len(set(cistopic_obj.cell_names))
len(cell_data.index)
len(set(cell_data.index))
len(
set([f"{bc}-{sample}___{sample}" for bc, sample in zip(cell_data.index, cell_data["sample"])])
& set(cistopic_obj.cell_names))
@SeppeDeWinter It's like this:
Hi @Melody-cell
That looks allright.
The reason for your issue is that pycisTopic assumes the following layout for barcodes [ACGT]*-[0-9]+-
, which is not the case for you.
(for example standard 10x barcodes fit this pattern: ACTGTAGCTAG-1).
You can either reformat your barcodes to fit this pattern. Or you can manually add the annotation like this (this is only valid to do if you don't have duplicate barcodes, as is the case for you):
import pandas as pd
cell_data["cell_names_formatted"] = [
f"{bc}-{sample}___{sample}" for bc, sample in zip(cell_data.index, cell_data["sample"])
]
cistopic_obj.cell_data = pd.merge(
left = cistopic_obj.cell_data ,
right = cell_data,
left_index = True, # index of cistopic_obj.cell_data are the cell names
right_on = "cell_names_formatted", # this should correspond to istopic_obj.cell_data.index,
how = "left" # only add annotations for cells in cistopic_obj.cell_data, cells that are in cistopic_obj.cell_data but not in cell_data will get NaN as annotation
)
I hope this helps?
All the best,
Seppe
Hi, @SeppeDeWinter I followed your step, it looks better: is this normal?
Hi @Melody-cell
This looks OK. The _X
and _Y
suffixes appear because you have overlapping column names in both dataframes.
See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html
All the best,
Seppe
@SeppeDeWinter , OK, thank you for your patient reply.
Hello, when i run this : cistopic_obj.add_cell_data(cell_data, split_pattern='__')
The cell_data is like this:
There are so many NaNs Did anyone know how to solve it?