angelolab / ark-analysis

Integrated pipeline for multiplexed image analysis
https://ark-analysis.readthedocs.io/en/latest/
MIT License
71 stars 25 forks source link

No images found in designated folder for `2_Pixie_Cluster_Pixels.pynb` #1148

Closed mezwick closed 1 month ago

mezwick commented 1 month ago

Please refer to our FAQ and look at our known issues before opening a bug report.

Describe the bug Running the 2_Pixie_Cluster_Pixels.pynb on the example data downloaded to the example data directory, i return an error in the pixie_preprocessing.create_pixel_matrix() which explains that


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[35], line 2
      1 # run pixel data preprocessing
----> 2 pixie_preprocessing.create_pixel_matrix(
      3     fovs,
      4     channels,
      5     base_dir,
      6     tiff_dir,
      7     pixie_seg_dir,
      8     img_sub_folder=img_sub_folder,
      9     seg_suffix=seg_suffix,
     10     pixel_output_dir=pixel_output_dir,
     11     data_dir=pixel_data_dir,
     12     subset_dir=pixel_subset_dir,
     13     norm_vals_name_post_rownorm=norm_vals_name,
     14     blur_factor=blur_factor,
     15     subset_proportion=subset_proportion,
     16     multiprocess=multiprocess,
     17     batch_size=batch_size
     18 )

File ~\Miniconda3\envs\ark_env\lib\site-packages\ark\phenotyping\pixie_preprocessing.py:345, in create_pixel_matrix(fovs, channels, base_dir, tiff_dir, seg_dir, img_sub_folder, seg_suffix, pixel_output_dir, data_dir, subset_dir, norm_vals_name_pre_rownorm, norm_vals_name_post_rownorm, pixel_thresh_name, channel_percentile_pre_rownorm, channel_percentile_post_rownorm, is_mibitiff, blur_factor, subset_proportion, seed, multiprocess, batch_size)
    342 # load existing channel_norm_pre_path if exists, otherwise generate
    343 if not os.path.exists(channel_norm_pre_rownorm_path):
    344     # compute channel percentiles
--> 345     channel_norm_pre_rownorm_df = pixel_cluster_utils.calculate_channel_percentiles(
    346         tiff_dir=tiff_dir,
    347         fovs=fovs,
    348         channels=channels,
    349         img_sub_folder=img_sub_folder,
    350         percentile=channel_percentile_pre_rownorm
    351     )
    352     # save output
    353     feather.write_dataframe(
    354         channel_norm_pre_rownorm_df, channel_norm_pre_rownorm_path, compression='uncompressed'
    355     )

File ~\Miniconda3\envs\ark_env\lib\site-packages\ark\phenotyping\pixel_cluster_utils.py:45, in calculate_channel_percentiles(tiff_dir, fovs, channels, img_sub_folder, percentile)
     42 percentile_list = []
     43 for fov in fovs:
     44     # load image data and remove 0 valued pixels
---> 45     img = load_utils.load_imgs_from_tree(data_dir=tiff_dir, img_sub_folder=img_sub_folder,
     46                                          channels=[channel], fovs=[fov]).values[0, :, :, 0]
     47     img = img[img > 0]
     49     # record and store percentile, skip if no non-zero pixels

File ~\Miniconda3\envs\ark_env\lib\site-packages\alpineer\load_utils.py:166, in load_imgs_from_tree(data_dir, img_sub_folder, fovs, channels, max_image_size)
    163     channels = [chan for _, chan in sorted(zip(channels_indices, all_channels))]
    165 if len(channels) == 0:
--> 166     raise ValueError(f"No images found in designated folder, {os.path.join(data_dir, fovs[0])}")
    168 test_img = io.imread(os.path.join(data_dir, fovs[0], img_sub_folder, channels[0]))
    170 # The dtype is always the type of the image being loaded in.

ValueError: No images found in designated folder, I:\example_dataset\image_data\fov0

Expected behavior I expected the images which i can see in the the named directory would be found and the pixel matrix created

To Reproduce I did not edit the exaple notebook beyond specifying the base directory.

I manually downloaded the example_dataset from hugging face

I set up the environment via conda with the environment.yml file i cloned from the ark-analysis repository.

cliu72 commented 1 month ago

Hi @mezwick, this error indicates that the image files aren't located in the correct place. As noted in the error message, it expects the images to be located at "I:\example_dataset\image_data." Since you manually downloaded the example dataset, make sure that you move them to the correct place.

Based on what you provided, your directory structure should look something like this:

I:\example_dataset\image_data
│ 
├── fov0
│   ├── CD3.tiff
│   ├── CD4.tiff
│   ├── CD8.tiff
│   ├── ...
├── fov1
│   ├── CD3.tiff
│   ├── CD4.tiff
│   ├── CD8.tiff
│   ├── ...
├── ...

Alternatively, you can also change the path at tiff_dir in the notebook to point to the location of the images.

mezwick commented 1 month ago

Hi.

Thanks for getting back to me :).

I can confirm that the directory structure reflects this and the images are contained in that directory.

I am running ark_env environment, setup with the environment.yml file cloned from the repo. The only changes i have made to the notebook are to specifiy base_dir = r'C:\example_dataset'

And to set segmentation_dir to None, as i do not have segmentation masks and am only interested in running the pixel clustering bit of the pipeline. segmentation_dir = None

I have also run os.path.exists(base_dir) os.path.exists(tiff_dir) to confirm the directories exist, both return True.

Nevertheless, when i run

# run pixel data preprocessing
pixie_preprocessing.create_pixel_matrix(
    fovs,
    channels,
    base_dir,
    tiff_dir,
    pixie_seg_dir,
    img_sub_folder=img_sub_folder,
    seg_suffix=seg_suffix,
    pixel_output_dir=pixel_output_dir,
    data_dir=pixel_data_dir,
    subset_dir=pixel_subset_dir,
    norm_vals_name_post_rownorm=norm_vals_name,
    blur_factor=blur_factor,
    subset_proportion=subset_proportion,
    multiprocess=multiprocess,
    batch_size=batch_size
)

This still returns the error ValueError: No images found in designated folder, C:\example_dataset\image_data\fov0

But, if i list the contents of that folder from python, i do find the images in that directory

test_path = r'C:\example_dataset\image_data\fov0'
os.listdir(test_path)

Returns

['CD14.tiff',
 'CD163.tiff',
 'CD20.tiff',
 'CD3.tiff',
 'CD31.tiff',
 'CD4.tiff',
 'CD45.tiff',
 'CD68.tiff',
 'CD8.tiff',
 'CK17.tiff',
 'Collagen1.tiff',
 'ECAD.tiff',
 'ECAD_smoothed.tiff',
 'Fibronectin.tiff',
 'GLUT1.tiff',
 'H3K27me3.tiff',
 'H3K9ac.tiff',
 'HLADR.tiff',
 'IDO.tiff',
 'Ki67.tiff',
 'PD1.tiff',
 'SMA.tiff',
 'Vim.tiff']
mezwick commented 1 month ago

I should say, the above example was running from a Windows work station.

But i have also now tested from a linux workstation. Again, creating the env from the repo cloned environment.yml file. Additionally, this time i downloaded the example data via the code cell in the example notebook.

In this case, the error returned also not being able to find the designated images, but it got to fov4. Complete error copied below.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[16], line 2
      1 # run pixel data preprocessing
----> 2 pixie_preprocessing.create_pixel_matrix(
      3     fovs,
      4     channels,
      5     base_dir,
      6     tiff_dir,
      7     pixie_seg_dir,
      8     img_sub_folder=img_sub_folder,
      9     seg_suffix=seg_suffix,
     10     pixel_output_dir=pixel_output_dir,
     11     data_dir=pixel_data_dir,
     12     subset_dir=pixel_subset_dir,
     13     norm_vals_name_post_rownorm=norm_vals_name,
     14     blur_factor=blur_factor,
     15     subset_proportion=subset_proportion,
     16     multiprocess=multiprocess,
     17     batch_size=batch_size
     18 )

File ~/anaconda3/envs/ark_env/lib/python3.10/site-packages/ark/phenotyping/pixie_preprocessing.py:345, in create_pixel_matrix(fovs, channels, base_dir, tiff_dir, seg_dir, img_sub_folder, seg_suffix, pixel_output_dir, data_dir, subset_dir, norm_vals_name_pre_rownorm, norm_vals_name_post_rownorm, pixel_thresh_name, channel_percentile_pre_rownorm, channel_percentile_post_rownorm, is_mibitiff, blur_factor, subset_proportion, seed, multiprocess, batch_size)
    342 # load existing channel_norm_pre_path if exists, otherwise generate
    343 if not os.path.exists(channel_norm_pre_rownorm_path):
    344     # compute channel percentiles
--> 345     channel_norm_pre_rownorm_df = pixel_cluster_utils.calculate_channel_percentiles(
    346         tiff_dir=tiff_dir,
    347         fovs=fovs,
    348         channels=channels,
    349         img_sub_folder=img_sub_folder,
    350         percentile=channel_percentile_pre_rownorm
    351     )
    352     # save output
    353     feather.write_dataframe(
    354         channel_norm_pre_rownorm_df, channel_norm_pre_rownorm_path, compression='uncompressed'
    355     )

File ~/anaconda3/envs/ark_env/lib/python3.10/site-packages/ark/phenotyping/pixel_cluster_utils.py:45, in calculate_channel_percentiles(tiff_dir, fovs, channels, img_sub_folder, percentile)
     42 percentile_list = []
     43 for fov in fovs:
     44     # load image data and remove 0 valued pixels
---> 45     img = load_utils.load_imgs_from_tree(data_dir=tiff_dir, img_sub_folder=img_sub_folder,
     46                                          channels=[channel], fovs=[fov]).values[0, :, :, 0]
     47     img = img[img > 0]
     49     # record and store percentile, skip if no non-zero pixels

File ~/anaconda3/envs/ark_env/lib/python3.10/site-packages/alpineer/load_utils.py:166, in load_imgs_from_tree(data_dir, img_sub_folder, fovs, channels, max_image_size)
    163     channels = [chan for _, chan in sorted(zip(channels_indices, all_channels))]
    165 if len(channels) == 0:
--> 166     raise ValueError(f"No images found in designated folder, {os.path.join(data_dir, fovs[0])}")
    168 test_img = io.imread(os.path.join(data_dir, fovs[0], img_sub_folder, channels[0]))
    170 # The dtype is always the type of the image being loaded in.

ValueError: No images found in designated folder, ../../../data/example_dataset/image_data/fov4

Again, i have checked and the fov directories are loaded with the images. Checked it with the following code just to be sure...

# Specify the directory path
directory_path = ('/').join([tiff_dir, 'image_data'])

# Function to check for .tiff files in each subdirectory
def check_tiff_in_subdirs(directory_path):
    for subdir, dirs, files in os.walk(directory_path):
        # Check if there is any .tiff file in the current subdir
        if not any(file.endswith('.tiff') or file.endswith('.tif') for file in files):
            # If no .tiff files are found in the current subdir, return False
            return False
    # If all subdirs have at least one .tiff file, return True
    return True

# Function to find directories without .tiff files
def find_dirs_without_tiff(directory_path):
    dirs_without_tiff = []
    for subdir, dirs, files in os.walk(directory_path):
        # Check if there is any .tiff file in the current subdir
        if not any(file.endswith('.tiff') or file.endswith('.tif') for file in files):
            # If no .tiff files are found in the current subdir, add it to the list
            dirs_without_tiff.append(subdir)
    return dirs_without_tiff

# Call the function and print the result
result = check_tiff_in_subdirs(directory_path)
print(f"Every subdirectory contains a .tiff file: {result}")

# Call the function and store the result
directories_without_tiff = find_dirs_without_tiff(directory_path)

# Print the list of directories without .tiff files
print("Directories without .tiff files:")
for directory in directories_without_tiff:
    print(directory)

Which returns

Every subdirectory contains a .tiff file: True
Directories without .tiff files:
mezwick commented 1 month ago

Ah! i have solved the issue.

It is because i was not calculating the nuclear image specified as channel CD163_nuc_exclude.

Now i have removed this from the channels list, all appears to run :).

Sorry for the hassle!

cliu72 commented 1 month ago

Glad you worked it out! This could be helpful for future users who run into the same error, so thanks!