drieslab / Giotto

Spatial omics analysis toolbox
https://drieslab.github.io/Giotto_website/
Other
258 stars 98 forks source link

failing to load specific fov #493

Closed Pointillomic closed 1 year ago

Pointillomic commented 1 year ago

Hello,

First of all, thank you for the amazing tool! It has made spatial transcriptomic analysis with the CosMX so much easier!

I had an issue with the last two versions of/commits to Giotto suite with loading specific FOV from the public lung CosMX dataset. Specifically, I would get the error "Error: [$] feat_ID is not a variable name in x", when loading FOV 31 and 32 for lung5_Rep1, whereas it works for the other FOV for that sample. Because it is so FOV based, I tried redownloading and checking the md5sum, but it seems to be fine.

The specific function that has the issue is 'createGiottoObjectSubcellular' and I included the output just before the error below.

Any help with this would be greatly appreciated, and thanks again for the great tool!


$mask_method [1] "guess"

$flip_vertical [1] TRUE

$flip_horizontal [1] FALSE

$shift_horizontal_step [1] FALSE

Selecting col "geom" as poly_ID column Selecting cols "x" and "y" as x and y respectively [1] 0 [1] 3648

  1. Finished extracting polygon information
  2. Add centroid / spatial locations if available
  3. Finish adding centroid / spatial locations
  4. Start extracting spatial feature information pointslist is a named list [ rna ] Process point info... Selecting col "gene_id" as feat_ID column Selecting cols "x" and "y" as x and y respectively
  5. Finished extracting spatial feature information Error: [$] feat_ID is not a variable name in x
jiajic commented 1 year ago

Hi @Pointillomic,

We're very glad to hear that Giotto is helpful!

I tried taking a look at this issue and was able to confirm for FOVs 31 and 32 in Lung5rep1, but it's a little strange. The information from fovs 31 and 32 seem to be absent from metadata_file.csv and tx_file.csv even though there are entries for them in the fov_positions_file.csv and images for them in all the image folders. This lack of data results in unexpectedly empty data.tables that result in the error message that you found.

I tried looking for hints in the feature detections in case they were accidentally mis-categorized into a different FOV in the feature detections file, but I didn't manage to find anything out of the ordinary.

tx_path = '/path/to/Lung5_Rep1_tx_file.csv'
tx_coord_all = data.table::fread(input = tx_path)

fov_counts = tx_coord_all[, table(fov)]
barplot(fov_counts)

image Detections per FOV: nothing really stands out as also having additional detections that should have been in 31 and 32.

local_max = tx_coord_all[, lapply(.SD, max), by = fov, .SDcols = c('x_local_px', 'y_local_px')]
plot(x = local_max$x_local_px, y = local_max$y_local_px, xlim = c(5100, 5900), ylim = c(3200, 4000))

image Local max x and y values (max values relative to FOV for the detection points) are all very similar for each FOV

global_mean = tx_coord_all[, lapply(.SD, max), by = fov, .SDcols = c('x_global_px', 'y_global_px')]
plot(x = global_mean$x_global_px, y = global_mean$y_global_px)

image Mean detection location for each FOV is regularly spaced for all 30 FOVs in the data

meta_path = '/path/to/Lung5_Rep1_metadata_file.csv'
meta = data.table::fread(meta_path)
meta[, unique(fov)]
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
[17] 17 18 19 20 21 22 23 24 25 26 27 28 29 30

The data for 31 and 32 also seem to be missing from the aggregate information

Best, Jiaji

Pointillomic commented 1 year ago

Hi Jiaji,

Thank you so much for looking into this so quickly!

This is super helpful since this indicates a problem with the data rather than the package, so I can go ahead with the analysis excluding those problematic FOV

Thanks again for the great package!