Issues with preprocessing: cell_type

ncborcherding commented 1 year ago

Dear Dr. Song,

First off - very exciting approach and really nice system.

I saw your preprint the other day and was interested in applying it to some of the example data that you link to for nanostring. I am running into an issue during preprocessing. At first, I struggled with trying to identify the file 'matched_annotation_all.csv' from the nanostring data download. I believe this is now "X_metadata_file.csv" file for the FOV based on the calling of headers in the processing code. However, I can't seem to find the variable cell_type anywhere.

Example of my outputs:

Loading the meta data

anno = pd.read_csv(annor)
anno_f1 = anno[anno['fov'] == int(fov)]
anno_f1.columns

Index(['fov', 'Area', 'AspectRatio', 'CenterX_local_px', 'CenterY_local_px', 'CenterX_global_px', 'CenterY_global_px', 'Width', 'Height', 'Mean.MembraneStain', 'Max.MembraneStain', 'Mean.PanCK', 'Max.PanCK', 'Mean.CD45', 'Max.CD45', 'Mean.CD3', 'Max.CD3', 'Mean.DAPI', 'Max.DAPI'], dtype='object')

1.8 get center of each cell

for i, row in anno_f1.iterrows():
    cx, cy = float(anno_f1['CenterX_local_px'][i]), float(anno_f1['CenterY_local_px'][i])
    anno_f1['CenterY_local_px'][i] = float(anno_f1['Height'][i]) - float(anno_f1['CenterY_local_px'][i])
    if cx - w < 0 or cx + w > width or cy - h < 0 or cy + h > height:
        anno_f1['cell_type'][i] = np.nan

KeyError: 'cell_type'

Am I using the correct file for the anno object? Or how/where is the cell_type variable generated?

Thanks, Nick

sebastianbirk commented 1 year ago

Dear Dr. Song,

First off - very exciting approach and really nice system.

I saw your preprint the other day and was interested in applying it to some of the example data that you link to for nanostring. I am running into an issue during preprocessing. At first, I struggled with trying to identify the file 'matched_annotation_all.csv' from the nanostring data download. I believe this is now "X_metadata_file.csv" file for the FOV based on the calling of headers in the processing code. However, I can't seem to find the variable cell_type anywhere.

Example of my outputs:

Loading the meta data
anno = pd.read_csv(annor)
anno_f1 = anno[anno['fov'] == int(fov)]
anno_f1.columns
Index(['fov', 'Area', 'AspectRatio', 'CenterX_local_px', 'CenterY_local_px', 'CenterX_global_px', 'CenterY_global_px', 'Width', 'Height', 'Mean.MembraneStain', 'Max.MembraneStain', 'Mean.PanCK', 'Max.PanCK', 'Mean.CD45', 'Max.CD45', 'Mean.CD3', 'Max.CD3', 'Mean.DAPI', 'Max.DAPI'], dtype='object')

1.8 get center of each cell
for i, row in anno_f1.iterrows():
    cx, cy = float(anno_f1['CenterX_local_px'][i]), float(anno_f1['CenterY_local_px'][i])
    anno_f1['CenterY_local_px'][i] = float(anno_f1['Height'][i]) - float(anno_f1['CenterY_local_px'][i])
    if cx - w < 0 or cx + w > width or cy - h < 0 or cy + h > height:
        anno_f1['cell_type'][i] = np.nan
KeyError: 'cell_type'

Am I using the correct file for the anno object? Or how/where is the cell_type variable generated?

Thanks, Nick

Did you find something out about this in the meantime? I would also be interested in the cell type annotations.

ncborcherding commented 1 year ago

@sebastianbirk - No sorry I have not.

frinkleko commented 8 months ago

@ncborcherding @sebastianbirk Hi, my friends, I successfully reproduce the figure showed in this paper. Here are some data preprocessing details.

The "cell_type" is the cell labels which are only provied in the "Processed Giotto Object". You need to manually exact the cell_type column in this object using R and match it with those csv files you currently is working on.

The dict which used to merge cell types into 8 major types actually has some problems. https://github.com/QSong-github/SiGra/blob/4786b2e4e33cb2b2436145e0ca23c9255ad2611e/SiGra_model/processing_nanostring.py#L104C1-L132C39 It should be

# follow the type merge as sigra
# https://github.com/QSong-github/SiGra/blob/main/Tutorials/SiGra_preprocess.ipynb
dicts = {}
dicts['T CD8 memory'] = 'lymphocyte'
dicts['T CD8 naive'] = 'lymphocyte'
dicts['T CD4 naive'] = 'lymphocyte'
dicts['T CD4 memory'] = 'lymphocyte'
dicts['Treg'] = 'lymphocyte'
dicts['B-cell'] = 'lymphocyte'
dicts['plasmablast'] = 'lymphocyte'
dicts['NK'] = 'lymphocyte'
dicts['monocyte'] = 'Mcell' # Mcell is myeloid cell
dicts['macrophage'] = 'Mcell' 
dicts['mDC'] = 'Mcell'
dicts['pDC'] = 'Mcell'
# sigra did not have correct operations on tumors X types
dicts['tumor 9'] = 'tumors'
dicts['tumor 5'] = 'tumors'
dicts['tumor 6'] = 'tumors'
dicts['tumor 12'] = 'tumors'
dicts['tumor 13'] = 'tumors'
dicts['epithelial'] = 'epithelial'
dicts['mast'] = 'mast'
dicts['endothelial'] = 'endothelial'
dicts['fibroblast'] = 'fibroblast'
dicts['neutrophil'] = 'neutrophil'

Let me know if you still have any questions.

ncborcherding commented 7 months ago

Awesome thanks for the follow up!!

QSong-github / SiGra