Teichlab / bin2cell

Join subcellular Visium HD bins into cells
MIT License
42 stars 1 forks source link

IndexError: index (13489) out of range #3

Closed Rafael-Silva-Oliveira closed 3 months ago

Rafael-Silva-Oliveira commented 3 months ago

I'm trying to follow the notebook demos to create the bin_count metadata, and I'm getting this error on this step:

b2c.insert_labels(adata, 
                  labels_npz_path="./he.npz", 
                  basis="spatial", 
                  spatial_key="spatial_cropped",
                  mpp=mpp, 
                  labels_key="labels_he"
                 )

Adata has this shape:

AnnData object with n_obs × n_vars = 9555564 × 17131
    obs: 'in_tissue', 'array_row', 'array_col', 'n_counts', 'destripe_factor', 'n_counts_adjusted'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells'
    uns: 'spatial', 'bin2cell'
    obsm: 'spatial', 'spatial_cropped'

Getting this error:

IndexError                                Traceback (most recent call last)
Cell In[39], [line 1](vscode-notebook-cell:?execution_count=39&line=1)
----> [1](vscode-notebook-cell:?execution_count=39&line=1) b2c.insert_labels(adata, 
      [2](vscode-notebook-cell:?execution_count=39&line=2)                   labels_npz_path="[./he.npz](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/Pipelines/STAnalysis/notebooks/experimental/B2C/he.npz)", 
      [3](vscode-notebook-cell:?execution_count=39&line=3)                   basis="spatial", 
      [4](vscode-notebook-cell:?execution_count=39&line=4)                   spatial_key="spatial_cropped",
      [5](vscode-notebook-cell:?execution_count=39&line=5)                   mpp=mpp, 
      [6](vscode-notebook-cell:?execution_count=39&line=6)                   labels_key="labels_he"
      [7](vscode-notebook-cell:?execution_count=39&line=7)                  )

File /mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/bin2cell/bin2cell.py:735, in insert_labels(adata, labels_npz_path, basis, spatial_key, mpp, labels_key)
    [732](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/bin2cell/bin2cell.py:732) coords = get_mpp_coords(adata, basis=basis, spatial_key=spatial_key, mpp=mpp)
    [733](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/bin2cell/bin2cell.py:733) #pull out the cell labels for the coordinates, can just index the sparse matrix with them
    [734](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/bin2cell/bin2cell.py:734) #insert into bin object, need to turn it into a 1d numpy array from a 1d numpy matrix first
--> [735](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/bin2cell/bin2cell.py:735) adata.obs[labels_key] = np.asarray(labels_sparse[coords[:,0], coords[:,1]]).flatten()

File /mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_csr.py:24, in _csr_base.__getitem__(self, key)
     [22](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_csr.py:22) def __getitem__(self, key):
     [23](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_csr.py:23)     if self.ndim == 2:
---> [24](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_csr.py:24)         return super().__getitem__(key)
     [26](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_csr.py:26)     if isinstance(key, tuple) and len(key) == 1:
     [27](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_csr.py:27)         key = key[0]

File /mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:52, in IndexMixin.__getitem__(self, key)
     [51](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:51) def __getitem__(self, key):
---> [52](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:52)     row, col = self._validate_indices(key)
     [54](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:54)     # Dispatch to specialized methods.
     [55](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:55)     if isinstance(row, INT_TYPES):

File /mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:186, in IndexMixin._validate_indices(self, key)
    [184](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:184)     row = _validate_bool_idx(bool_row, M, "row")
    [185](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:185) elif not isinstance(row, slice):
--> [186](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:186)     row = self._asindices(row, M)
    [188](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:188) if isintlike(col):
    [189](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:189)     col = int(col)

File /mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:220, in IndexMixin._asindices(self, idx, length)
    [218](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:218) max_indx = x.max()
    [219](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:219) if max_indx >= length:
--> [220](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:220)     raise IndexError('index (%d) out of range' % max_indx)
    [222](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:222) min_indx = x.min()
    [223](https://vscode-remote+ssh-002dremote-002blcg.vscode-resource.vscode-cdn.net/mnt/work/RO_src/venv/bin2cell_venv/lib/python3.10/site-packages/scipy/sparse/_index.py:223) if min_indx < 0:

IndexError: index (13489) out of range

Does anyone know why this is happening?

Rafael-Silva-Oliveira commented 3 months ago

On this, printing the max of coords and labels_sparse.shape results in:

print(coords.max())
print(labels_sparse.shape)
13552
(12869, 13463)
Rafael-Silva-Oliveira commented 3 months ago

It seems when I use a mask, the error doesn't happen, but if I use the whole dataset, I get the error. Not entirely sure why that is happening (maybe some bins on the extremes that are not being counted?)

I'm using the Visium HD lung dataset

Rafael-Silva-Oliveira commented 3 months ago

Doesn't seem to work very well with the Visium HD lung dataset (misses a lot of cells), perhaps because it's an IF image and not HE. I wonder if Stardist has a model for IF images? I can see there's one available (2D_versatile_fluo), but I'll test it out and see if the results improve (though from what I was seeing, it seems that it doesn't really fit the IF imaging type as seen below)

image

Edit with the fluo model, still not good enough:

image

ktpolanski commented 3 months ago

Thanks for the heads up - apparently this particular dataset has spots that extend past the tissue image, as evidenced by comparing the dimensions of Visium_HD_Human_Lung_Cancer_tissue_image.tif and the maximum values of adata.obsm["spatial"]. Also we worked with H&E demo data when developing the package, and the IF image behaves differently. Your idea to go with the fluorescence model is a good one, and we're getting promising results by using the third channel of the TIF. There's some stuff to iron out though, we'll keep you posted!

Rafael-Silva-Oliveira commented 3 months ago

Thanks for the heads up - apparently this particular dataset has spots that extend past the tissue image, as evidenced by comparing the dimensions of Visium_HD_Human_Lung_Cancer_tissue_image.tif and the maximum values of adata.obsm["spatial"]. Also we worked with H&E demo data when developing the package, and the IF image behaves differently. Your idea to go with the fluorescence model is a good one, and we're getting promising results by using the third channel of the TIF. There's some stuff to iron out though, we'll keep you posted!

That's perfect thank you :) The actual data for the project I'm working on is Visium HD with H&E so the default model should be just fine, but I could only find this Lung dataset with Visium HD as an IF image; Will have a try once I have my data ready!

ktpolanski commented 3 months ago

I modified insert_labels() to skip out of bounds bins when importing segmentation results. This way we get to keep the strange bins that fall outside the image as they might still be useful, e.g. when doing the GEX part of the analysis. Haven't cut a new version for now, but it should be installable from GitHub.

When confronted with the IF image, cv2 only reads its first channel, while the values of interest reside in the third. I've proposed a scaled_if_image() that can then be used with StarDist's fluorescence model, and it's now with Nadav for testing.

ktpolanski commented 3 months ago

And b2c.scaled_if_image() is now on GitHub as well!

mpp = 0.35
b2c.scaled_if_image(adata, channel=2, mpp=mpp, save_path="stardist/if.tiff")
b2c.stardist(image_path="stardist/if.tiff", 
             labels_npz_path="stardist/if.npz", 
             stardist_model="2D_versatile_fluo", 
             prob_thresh=0.1
            )

This yields a good looking segmentation:

image

image

This is the following subset:

mask = ((adata.obs["array_row"]>500) & 
        (adata.obs["array_row"]<530) & 
        (adata.obs["array_col"]>1000) & 
        (adata.obs["array_col"]<1030)
       )

Not making a new version quite yet as I'll need to touch up the docs in a few places to account for the new addition.

Rafael-Silva-Oliveira commented 3 months ago

And b2c.scaled_if_image() is now on GitHub as well!

mpp = 0.35
b2c.scaled_if_image(adata, channel=2, mpp=mpp, save_path="stardist/if.tiff")
b2c.stardist(image_path="stardist/if.tiff", 
             labels_npz_path="stardist/if.npz", 
             stardist_model="2D_versatile_fluo", 
             prob_thresh=0.1
            )

This yields a good looking segmentation:

image

image

This is the following subset:

mask = ((adata.obs["array_row"]>500) & 
        (adata.obs["array_row"]<530) & 
        (adata.obs["array_col"]>1000) & 
        (adata.obs["array_col"]<1030)
       )

Not making a new version quite yet as I'll need to touch up the docs in a few places to account for the new addition.

You guys rock! Blazing speed as well - Will test it in a bit :)

Rafael-Silva-Oliveira commented 3 months ago

I tried it and it seems to be much better!

output

With cell typist predictions:

image

image

Although I'm not sure what the workflow should look like, since everytime I do seconday labels, I always get 0?

Here's what I'm doing (might be long, apologies for that!)

mpp = 0.35

b2c.scaled_if_image(adata, channel=2,mpp=mpp, save_path="./if.tiff")
b2c.destripe(adata)

######### IF component ##########
#define a mask to easily pull out this region of the object in the future
mask = ((adata.obs["array_row"]>500) & 
        (adata.obs["array_row"]<530) & 
        (adata.obs["array_col"]>1000) & 
        (adata.obs["array_col"]<1030)
       )

bdata = adata[mask]

b2c.stardist(image_path="./if.tiff", 
             labels_npz_path="./if.npz", 
             stardist_model="2D_versatile_fluo", 
             prob_thresh=0.1
            )

b2c.insert_labels(adata=bdata, 
                  labels_npz_path="./if.npz", 
                  basis="spatial", 
                  spatial_key="spatial_cropped",
                  mpp=mpp, 
                  labels_key="labels_if"
                 )

bdata.obs["labels_if"] = bdata.obs["labels_if"].astype(int)
bdata = bdata[bdata.obs['labels_if']>0]
bdata.obs['labels_if'] = bdata.obs['labels_if'].astype(str)

b2c.expand_labels(bdata, 
                  labels_key='labels_if', 
                  expanded_labels_key="labels_if_expanded"
                 )

bdata.obs["labels_if_expanded"] = bdata.obs["labels_if_expanded"].astype(int)
bdata = bdata[bdata.obs['labels_if_expanded']>0]
bdata.obs['labels_if_expanded'] = bdata.obs['labels_if_expanded'].astype(str)

b2c.grid_image(bdata, "n_counts_adjusted", mpp=mpp, sigma=5, save_path="./if_gex.tiff")

b2c.stardist(image_path="./if_gex.tiff", 
             labels_npz_path="./if_gex.npz", 
             stardist_model="2D_versatile_fluo", 
             prob_thresh=0.05, 
             nms_thresh=0.5
            )

b2c.insert_labels(bdata, 
                  labels_npz_path="./if_gex.npz", 
                  basis="array", 
                  mpp=mpp, 
                  labels_key="labels_if_gex"
                 )

bdata.obs["labels_if_gex"] = bdata.obs["labels_if_gex"].astype(int)

bdata = bdata[bdata.obs['labels_if_gex']>0]
bdata.obs['labels_if_gex'] = bdata.obs['labels_if_gex'].astype(str)

bdata.obs["labels_if_expanded"] = bdata.obs["labels_if_expanded"].astype(int)

### This one always returns 0s
b2c.salvage_secondary_labels(bdata, 
                             primary_label="labels_if_expanded", 
                             secondary_label="labels_if_gex", 
                             labels_key="labels_if_joint"
                            )

bdata = bdata[bdata.obs['labels_if_joint']>0]
bdata.obs['labels_if_joint'] = bdata.obs['labels_if_joint'].astype(str)

cdata = b2c.bin_to_cell(bdata, labels_key="labels_if_joint", spatial_keys=["spatial", "spatial_cropped"])

cell_mask = ((cdata.obs['array_row'] >= 1450) & 
             (cdata.obs['array_row'] <= 1550) & 
             (cdata.obs['array_col'] >= 250) & 
             (cdata.obs['array_col'] <= 450)
            )

ddata = cdata[cell_mask]

sc.pl.spatial(adata=ddata, color=["bin_count", "labels_if_joint_source"],  basis="spatial_cropped")

This outputs the first image I showed;

Then, for CellTypist I'm essentially following the tutorial you guys provide (I also train a healthy model + tumor model and do as you guys do). Do we need to do all the steps to get the "bin_count" variable?

ktpolanski commented 3 months ago

In the demo notebook, all the major operations (i.e. segmentations, label insertion etc.) happen on the main adata that is loaded at the start. The bdata is just generated to visualise how the various steps are doing along the way. Meanwhile in the snippet above you seem to be using the bdata for pretty much everything once you generate it? I guess in this particular little bit of tissue the nuclei are well captured and the secondary GEX labels are not necessary.

Rafael-Silva-Oliveira commented 3 months ago

In the demo notebook, all the major operations (i.e. segmentations, label insertion etc.) happen on the main adata that is loaded at the start. The bdata is just generated to visualise how the various steps are doing along the way. Meanwhile in the snippet above you seem to be using the bdata for pretty much everything once you generate it? I guess in this particular little bit of tissue the nuclei are well captured and the secondary GEX labels are not necessary.

Yeah I wanted to run things just for that mask, not the entire dataset (just for quick testing)

Looks nice though! I will probably go with this approach (Bin2Cell + CellTypist) for cell type inference using a reference scRNA dataset once I get my HE dataset. Thanks for the support, looking forward to see other work on this tool! :)

ktpolanski commented 3 months ago

I ran a simple GEX segmentation and then performed the label salvage. I'm seeing 551904 IF labels and 7747 secondary GEX-derived ones. The DAPI seems to be doing a good job of recovering the nuclei. As such it's entirely feasible that the selected region just doesn't have any.

Rafael-Silva-Oliveira commented 3 months ago

I ran a simple GEX segmentation and then performed the label salvage. I'm seeing 551904 IF labels and 7747 secondary GEX-derived ones. The DAPI seems to be doing a good job of recovering the nuclei. As such it's entirely feasible that the selected region just doesn't have any.

100%, I think that's the reason too! I'll put things nicely in a wrapper function to run everything at once so it's easier to test, but so far so good :) Thanks again!

Rafael-Silva-Oliveira commented 2 months ago

I ran a simple GEX segmentation and then performed the label salvage. I'm seeing 551904 IF labels and 7747 secondary GEX-derived ones. The DAPI seems to be doing a good job of recovering the nuclei. As such it's entirely feasible that the selected region just doesn't have any.

I'm actually not able to get the extra secondary labels, do you have a notebook I could have a look? I get 543621 IF labels but no secondary GEX for some reason (could be some different step I'm doing wrong)

ktpolanski commented 2 months ago

Untitled4-Copy1.ipynb.txt

I had to add .txt so it would just let me attach it here.

Rafael-Silva-Oliveira commented 2 months ago

Untitled4-Copy1.ipynb.txt

I had to add .txt so it would just let me attach it here.

That's perfect, thank you very much!! :)

Rafael-Silva-Oliveira commented 2 months ago

Looking like a pretty neat visualization :) Great tool, definitely solving something huge and very much needed with the new HD technologies! Will follow closely new releases

image