Training Step ValueError

meaksu commented 2 weeks ago

Hi, I am getting an error during the model.train() step and have no idea what could be going wrong. Here is the full error:

2024-06-12 22:47:29,009 INFO Initialising model
Number of genes: 260
2024-06-12 22:47:29,614 INFO Preparing data
Loaded nuclei
(12283, 11464)
6477 patches available
Cell types: ['Arterial', 'Astrocyte', 'Capillary', 'Macrophage/Microglia', 'Meningeal Fibroblast', 'Neurons', 'Oligo', 'OPCs', 'Pericyte', 'Perivascular Fibroblast', 'SMC', 'Venous']
2024-06-12 22:47:34,006 INFO Total number of training examples: 6477
2024-06-12 22:47:34,022 INFO Begin training

Epoch = 1  lr = 1e-05
2024-06-12 22:47:34,636 INFO Model saved: /dmpi/analysis/SGregory/Spatial/Xenium/Wang_0017/Xenium_Files/output-XETG00054__0034155__182__20240429__173122/model_outputs/2024_06_12_22_47_28/models/epoch_1_step_0.pth

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 model.train()

File ~/miniconda3/envs/bidcell/lib/python3.10/site-packages/bidcell/BIDCellModel.py:114, in BIDCellModel.train(self)
    111 def train(self) -> None:
    112     """Train the model.
    113     """
--> 114     train(self.config)

File ~/miniconda3/envs/bidcell/lib/python3.10/site-packages/bidcell/model/train.py:170, in train(config)
    167 cur_lr = optimizer.param_groups[0]["lr"]
    168 print("\nEpoch =", (epoch + 1), " lr =", cur_lr)
--> 170 for step_epoch, (
    171     batch_x313,
    172     batch_n,
    173     batch_sa,
    174     batch_pos,
    175     batch_neg,
    176     coords_h1,
    177     coords_w1,
    178     nucl_aug,
    179     expr_aug_sum,
    180 ) in enumerate(train_loader):
    181     # Permute channels axis to batch axis
    182     # torch.Size([1, patch_size, patch_size, 313, n_cells]) to [n_cells, 313, patch_size, patch_size]
    183     batch_x313 = batch_x313[0, :, :, :, :].permute(3, 2, 0, 1)
    184     batch_sa = batch_sa.permute(3, 0, 1, 2)

File ~/miniconda3/envs/bidcell/lib/python3.10/site-packages/torch/utils/data/dataloader.py:631, in _BaseDataLoaderIter.__next__(self)
    628 if self._sampler_iter is None:
    629     # TODO(https://github.com/pytorch/pytorch/issues/76750)
    630     self._reset()  # type: ignore[call-arg]
--> 631 data = self._next_data()
    632 self._num_yielded += 1
    633 if self._dataset_kind == _DatasetKind.Iterable and \
    634         self._IterableDataset_len_called is not None and \
    635         self._num_yielded > self._IterableDataset_len_called:

File ~/miniconda3/envs/bidcell/lib/python3.10/site-packages/torch/utils/data/dataloader.py:675, in _SingleProcessDataLoaderIter._next_data(self)
    673 def _next_data(self):
    674     index = self._next_index()  # may raise StopIteration
--> 675     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    676     if self._pin_memory:
    677         data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)

File ~/miniconda3/envs/bidcell/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:51, in _MapDatasetFetcher.fetch(self, possibly_batched_index)
     49         data = self.dataset.__getitems__(possibly_batched_index)
     50     else:
---> 51         data = [self.dataset[idx] for idx in possibly_batched_index]
     52 else:
     53     data = self.dataset[possibly_batched_index]

File ~/miniconda3/envs/bidcell/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:51, in <listcomp>(.0)
     49         data = self.dataset.__getitems__(possibly_batched_index)
     50     else:
---> 51         data = [self.dataset[idx] for idx in possibly_batched_index]
     52 else:
     53     data = self.dataset[possibly_batched_index]

File ~/miniconda3/envs/bidcell/lib/python3.10/site-packages/bidcell/model/dataio/dataset_input.py:297, in DataProcessing.__getitem__(self, index)
    292     except Exception:
    293         search_areas[:, :, i_cell] = cv2.dilate(
    294             nucl_split[:, :, i_cell], kernel, iterations=1
    295         )
--> 297 ct_nucleus = int(self.nuclei_types_idx[self.nuclei_types_ids.index(c_id)])
    298 ct_nucleus_name = self.type_names[ct_nucleus]
    300 # Markers with dilation
    301 # ct_pos = np.expand_dims(np.expand_dims(self.pos_markers[ct_nucleus,:], 0),0)*expr_aug

ValueError: 1303.0 is not in list

It is a different index every time, the first time I ran it was ValueError: 6321.0 is not in list

xhelenfu commented 2 weeks ago

Hi, thanks for your question. It looks like the code cannot find the cell type for a given cell ID. It could be related to the model.preannotate() step. What is the number of nuclei in nuclei_cell_type.h5 and nuclei.tif?

meaksu commented 2 weeks ago

Hi,

I'm not exactly sure how to check that, but I'm wondering if it could be due to my altering of the initial code. Since the introduction of the multimodal staining for 10x Xenium, there are now four tiff files in the output

"The improved algorithm for generating 2D focus images now outputs files in multi-file OME-TIFF format, instead of the morphology_focus.ome.tif and morphology_mip.ome.tif files. The new morphology_focus/ directory contains the 2D focus morphology_focus_xxxx.ome.tif files. For DAPI-only datasets, the directory contains morphology_focus/morphology_focus_0000.ome.tif. For cell segmentation staining workflow datasets, there are four ome.tif files, one per stain image in this directory."

I selected the 0000.ome.tif file as input, but it seems like this added an extra third dimension in terms of h and w. The h is now 4 and the w is what the h should be. So I changed the tifffile.imread function, setting is.ome to False, which fixed the dimensions and didn't produce an error until now with the training step. Would this possibly be what is causing the error, and if so, is there a way I can run BIDCell with the new Xenium output format? I attached a snippet containing the single line I altered in the segment_nuclei function below

def segment_nuclei(config: Config):
    dir_dataset = config.files.data_dir

    print("Reading DAPI image")
    if config.files.fp_dapi is None:
        fp_dapi = os.path.join(dir_dataset, "dapi_stitched.tif")
    else:
        fp_dapi = config.files.fp_dapi
    print(fp_dapi)
    #dapi = tifffile.imread(fp_dapi)
    dapi = tifffile.imread(fp_dapi, is_ome=False, level=0)

xhelenfu commented 2 weeks ago

The altered line looks fine to me, and the 0000.ome.tif file should be DAPI. I think it's worth taking a look at nuclei.tif (eg with ImageJ) to see if the nuclei look reasonable

meaksu commented 2 weeks ago

Thanks for your help so far, based on ImageJ the nuclei look reasonable to me. ImageJ counted 98417 nuclei but the actual number might be higher since not all nuclei were able to be separated. I attached images of how they look.

xhelenfu commented 2 weeks ago

Thanks for the nuclei images, they look OK to me too. I'm suspecting that something may have gone wrong during the nuclei annotation step, which could be during model.make_cell_gene_mat(is_cell=False) or model.preannotate(). The cell_gene_matrices/nuclei/expr_mat.csv file from the make_cell_gene_mat step should have 98417 rows. For preannotate, nuclei_cell_type.h5 is expected to contain 98417 nuclei as well.

import h5py
h5f = h5py.File("nuclei_cell_type.h5", "r")
types_idx = list(h5f["data"][:])
cell_ids = list(h5f["ids"][:])
print(len(types_idx), len(cell_ids )) # both should equal the number of nuclei
h5f.close()

Please let me know if something doesn't look right in these outputs

meaksu commented 1 week ago

Hi, The expr_mat.csv file has 124571 rows and the h5 file has 108998 nuclei.

xhelenfu commented 8 hours ago

Hi, sorry for the late reply, it seems like the code didn't compute the preannotation for all nuclei for some reason. Could you try in preannotate.py, adding under df_cells = pd.read_csv(os.path.join(expr_dir, config.files.fp_expr), index_col=0):

df_cells = df_cells.iloc[:60000,:] and then renaming the output to nuclei_cell_type_1.h5

Then repeating with the rest of the nuclei, df_cells = df_cells.iloc[60000:,:] and then renaming the output tonuclei_cell_type_2.h5

and then running this script (after updating directory name)

import numpy as np 
import h5py 

# please check file names
fp_a = "./your_dir/nuclei_cell_type_1.h5"
fp_b = "./your_dir/nuclei_cell_type_2.h5"

def load_data(fp):
    h5f = h5py.File(fp, "r")
    nuclei_types_idx = h5f["data"][:]
    nuclei_types_ids = h5f["ids"][:]
    h5f.close()
    return nuclei_types_idx, nuclei_types_ids

idx_a, ids_a = load_data(fp_a)
idx_b, ids_b = load_data(fp_b)

print(idx_a.shape, ids_a.shape)
print(idx_b.shape, ids_b.shape)

idx = np.concatenate((idx_a, idx_b))
ids = np.concatenate((ids_a, ids_b))

print(idx.shape, ids.shape)

# please check file names
h5f = h5py.File("./your_dir/nuclei_cell_type.h5", "w")
h5f.create_dataset("data", data=idx)
h5f.create_dataset("ids", data=ids)
h5f.close()

SydneyBioX / BIDCell

Training Step ValueError #17