Open meaksu opened 2 weeks ago
Hi, thanks for your question. It looks like the code cannot find the cell type for a given cell ID. It could be related to the model.preannotate()
step. What is the number of nuclei in nuclei_cell_type.h5
and nuclei.tif
?
Hi,
I'm not exactly sure how to check that, but I'm wondering if it could be due to my altering of the initial code. Since the introduction of the multimodal staining for 10x Xenium, there are now four tiff files in the output
"The improved algorithm for generating 2D focus images now outputs files in multi-file OME-TIFF format, instead of the morphology_focus.ome.tif and morphology_mip.ome.tif files. The new morphology_focus/ directory contains the 2D focus morphology_focus_xxxx.ome.tif files. For DAPI-only datasets, the directory contains morphology_focus/morphology_focus_0000.ome.tif. For cell segmentation staining workflow datasets, there are four ome.tif files, one per stain image in this directory."
I selected the 0000.ome.tif file as input, but it seems like this added an extra third dimension in terms of h and w. The h is now 4 and the w is what the h should be. So I changed the tifffile.imread function, setting is.ome to False, which fixed the dimensions and didn't produce an error until now with the training step. Would this possibly be what is causing the error, and if so, is there a way I can run BIDCell with the new Xenium output format? I attached a snippet containing the single line I altered in the segment_nuclei function below
def segment_nuclei(config: Config):
dir_dataset = config.files.data_dir
print("Reading DAPI image")
if config.files.fp_dapi is None:
fp_dapi = os.path.join(dir_dataset, "dapi_stitched.tif")
else:
fp_dapi = config.files.fp_dapi
print(fp_dapi)
#dapi = tifffile.imread(fp_dapi)
dapi = tifffile.imread(fp_dapi, is_ome=False, level=0)
The altered line looks fine to me, and the 0000.ome.tif file should be DAPI. I think it's worth taking a look at nuclei.tif
(eg with ImageJ) to see if the nuclei look reasonable
Thanks for your help so far, based on ImageJ the nuclei look reasonable to me. ImageJ counted 98417 nuclei but the actual number might be higher since not all nuclei were able to be separated. I attached images of how they look.
Thanks for the nuclei images, they look OK to me too. I'm suspecting that something may have gone wrong during the nuclei annotation step, which could be during model.make_cell_gene_mat(is_cell=False)
or model.preannotate()
. The cell_gene_matrices/nuclei/expr_mat.csv
file from the make_cell_gene_mat
step should have 98417 rows. For preannotate
, nuclei_cell_type.h5
is expected to contain 98417 nuclei as well.
import h5py
h5f = h5py.File("nuclei_cell_type.h5", "r")
types_idx = list(h5f["data"][:])
cell_ids = list(h5f["ids"][:])
print(len(types_idx), len(cell_ids )) # both should equal the number of nuclei
h5f.close()
Please let me know if something doesn't look right in these outputs
Hi, The expr_mat.csv file has 124571 rows and the h5 file has 108998 nuclei.
Hi, sorry for the late reply, it seems like the code didn't compute the preannotation for all nuclei for some reason. Could you try in preannotate.py, adding under df_cells = pd.read_csv(os.path.join(expr_dir, config.files.fp_expr), index_col=0)
:
df_cells = df_cells.iloc[:60000,:]
and then renaming the output to nuclei_cell_type_1.h5
Then repeating with the rest of the nuclei,
df_cells = df_cells.iloc[60000:,:]
and then renaming the output tonuclei_cell_type_2.h5
and then running this script (after updating directory name)
import numpy as np
import h5py
# please check file names
fp_a = "./your_dir/nuclei_cell_type_1.h5"
fp_b = "./your_dir/nuclei_cell_type_2.h5"
def load_data(fp):
h5f = h5py.File(fp, "r")
nuclei_types_idx = h5f["data"][:]
nuclei_types_ids = h5f["ids"][:]
h5f.close()
return nuclei_types_idx, nuclei_types_ids
idx_a, ids_a = load_data(fp_a)
idx_b, ids_b = load_data(fp_b)
print(idx_a.shape, ids_a.shape)
print(idx_b.shape, ids_b.shape)
idx = np.concatenate((idx_a, idx_b))
ids = np.concatenate((ids_a, ids_b))
print(idx.shape, ids.shape)
# please check file names
h5f = h5py.File("./your_dir/nuclei_cell_type.h5", "w")
h5f.create_dataset("data", data=idx)
h5f.create_dataset("ids", data=ids)
h5f.close()
Hi, I am getting an error during the model.train() step and have no idea what could be going wrong. Here is the full error:
It is a different index every time, the first time I ran it was ValueError: 6321.0 is not in list