huayingq1996 commented 1 year ago

Hi,

I was trying out CellSighter earlier today and I encountered the bug as mentioned in the title of this issue:

utils.py load_image(image_path=image_path[0]... list index out of range

In your README, you wrote the image and segmentation should be tif files. Therefore, when I stack the individual images for each channel to generate the 3D tif, I saved the file as .tif. Same for the segmentation files. However, when I looked at the source code for load_image in utils.py, I found out that the program actually expects .tiff instead of .tif. Please make it clear in your README which extension is actually used for other users' convenience.

yaelAmitay commented 1 year ago

Yes you are right, I changed the README to "tiff", Thanks!

mikemcka commented 4 months ago

Hi there, I am having the same issue at the moment with the same error.

The full error is:

(CellSightenv) [mckay.m@gpu-a30-n03 CellSighter]$ python train.py --base_path=/vast/imaging/MikeyM/CellSighter/NSCLC_1/cell_classification [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,33, 34, 35, 36, 37] Load training data... Traceback (most recent call last): File "train.py", line 86, in train_crops, val_crops = load_crops(config["root_dir"], File "/vast/imaging/MikeyM/CellSighter/data/utils.py", line 171, in load_crops train_crops = load_samples(images_dir=data_dir, cells_dir=cells_dir, cells2labels_dir=cells2labels_dir, File "/vast/imaging/MikeyM/CellSighter/data/utils.py", line 116, in load_samples image, cells, cl2lbl = load_image(image_path=image_path[0], IndexError: list index out of range

I am working with a MIBI dataset with 13 FOVs with 38 channels. From Qupath I have exported the images as .tif files and cell segmentations with unqiue IDs for each cell as a jpg following the scripts Export_tiff and Export_indexed_cell_mask. I have also exported my cell labels from Qupath and generated a csv file for each FOV in my dataprocessing ([Process_cellsighter_files.ipynb]. All of my files are converted to .npz files during data preprocessing and look similar to the example files provide and files from datasets listed in the paper. The environment is not a problem and the model trains fine with the example data. All files referenced are located in my github link: (https://github.com/mikemcka/NSCLC_Cellsighter/blob/main/Process_cellsighter_files.ipynb)

Key concerns are:

My FOVs are not of consistent HxW size, however for each FOV the cell mask and the .tif are of the same size HxW. I have confirmed this is consistent across my dataset.
I have not added any padding to make the crop size of (128px) a common factor of all my file sizes. is this a problem or does the code work with any file size?
Issue with config file, I have filled it out to the best of my knowledge but I am new to cell phenotyping tools

Any help would be greatly appreciated, CellSighter looks like a really great tool and would very much like to see it working with our dataset.

Kind Regards

Mike

yaelAmitay commented 4 months ago

Hi, Could it be that in your config.json file you are missing a slash in the path to the images? You have: "root_dir": "vast/imaging/MikeyM/CellSighter/NSCLC_1/cell_classification/" instead of: "root_dir": "/vast/imaging/MikeyM/CellSighter/NSCLC_1/cell_classification/"

mikemcka commented 4 months ago

Thanks for finding that. I made the change and still getting the same error with both padded and unpadded versions of the data. some of the FOVs only have one specific cell type labelled, I have changed all the labels for the other cells in these FOVs, That wouldnt be a problem would it having a load of extra cells out of your validation set labelled as -1?

yaelAmitay commented 4 months ago

Hi, I see that you deleted your git so I cant look into it, However, the error above means that it can't find your images. So, I would start by looking at the paths and make sure they are correct. (print image_path from data/utils.py line 110)

Regarding your questions, I dont think there's a problem with some images having only one cell type, and the error you are having is unrelated to the cell types.

(Please open a new issue if you have further questions)

mikemcka commented 4 months ago

Hi Yael,

We managed to get it running by using the os package and using that to load the data in, I will send you the version of the updated utils.py file if you want to take a look.

On Mon, Jul 1, 2024, 5:41 PM yaelAmitay @.***> wrote:

Hi, I see that you deleted your git so I cant look into it, However, the error above means that it can't find your images. So, I would start by looking at the paths and make sure they are correct. (print image_path from data/utils.py line 110)

Regarding your questions, I dont think there's a problem with some images having only one cell type, and the error you are having is unrelated to the cell types.

(Please open a new issue if you have further questions)

— Reply to this email directly, view it on GitHub https://github.com/KerenLab/CellSighter/issues/2#issuecomment-2199455286, or unsubscribe https://github.com/notifications/unsubscribe-auth/A374V5URMSIZ3QR7PAU2HT3ZKEBY7AVCNFSM6AAAAABJ5EHNE6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJZGQ2TKMRYGY . You are receiving this because you commented.Message ID: @.***>

mikemcka commented 4 months ago

Here is the modified code from the utils.py file that worked for us, the primary issue was that the the file suffix of our loaded data was still being kept. the modified section of code is from line 104:

import os

Added path variables and splitext function to remove unecesssary file suffix being loaded eg (FOV1.tif -> FOV1.tif.npz)

#added import os to use splitext function
#also changed the file suffix of glob.glob funtion to .tif to match out filenames, might not be a problem but just to be safe
images_dir = Path(images_dir)
cells_dir = Path(cells_dir)
cells2labels_dir = Path(cells2labels_dir)
crops = []
for image_id in images_names:
    image_id= os.path.splitext(os.path.basename(image_id))[0]
    image_path = glob.glob(str(images_dir / f"{image_id}.npz")) + \
                 glob.glob(str(images_dir / f"{image_id}.tif"))
    cells_path = glob.glob(str(cells_dir / f"{image_id}.npz")) + \
                 glob.glob(str(cells_dir / f"{image_id}.tif"))
    cells2labels_path = glob.glob(str(cells2labels_dir / f"{image_id}.npz")) + \
                        glob.glob(str(cells2labels_dir / f"{image_id}.txt"))
    #print statements for sanity checks
    print(image_path)
    print(cells_path)
    print(cells2labels_path)
    print(channels, to_pad, crop_size)
    image, cells, cl2lbl = load_image(image_path=image_path[0],
                                      cells_path=cells_path[0],
                                      cells2labels_path=cells2labels_path[0],
                                      channels=channels,
                                      to_pad=to_pad,
                                      crop_size=crop_size)

KerenLab / CellSighter

bug report: utils.py load_image(image_path=image_path[0]... list index out of range #2

Added path variables and splitext function to remove unecesssary file suffix being loaded eg (FOV1.tif -> FOV1.tif.npz)