Closed Rafael-Silva-Oliveira closed 3 months ago
On this, printing the max of coords and labels_sparse.shape results in:
print(coords.max())
print(labels_sparse.shape)
13552
(12869, 13463)
It seems when I use a mask, the error doesn't happen, but if I use the whole dataset, I get the error. Not entirely sure why that is happening (maybe some bins on the extremes that are not being counted?)
I'm using the Visium HD lung dataset
Doesn't seem to work very well with the Visium HD lung dataset (misses a lot of cells), perhaps because it's an IF image and not HE. I wonder if Stardist has a model for IF images? I can see there's one available (2D_versatile_fluo), but I'll test it out and see if the results improve (though from what I was seeing, it seems that it doesn't really fit the IF imaging type as seen below)
Edit with the fluo model, still not good enough:
Thanks for the heads up - apparently this particular dataset has spots that extend past the tissue image, as evidenced by comparing the dimensions of Visium_HD_Human_Lung_Cancer_tissue_image.tif
and the maximum values of adata.obsm["spatial"]
. Also we worked with H&E demo data when developing the package, and the IF image behaves differently. Your idea to go with the fluorescence model is a good one, and we're getting promising results by using the third channel of the TIF. There's some stuff to iron out though, we'll keep you posted!
Thanks for the heads up - apparently this particular dataset has spots that extend past the tissue image, as evidenced by comparing the dimensions of
Visium_HD_Human_Lung_Cancer_tissue_image.tif
and the maximum values ofadata.obsm["spatial"]
. Also we worked with H&E demo data when developing the package, and the IF image behaves differently. Your idea to go with the fluorescence model is a good one, and we're getting promising results by using the third channel of the TIF. There's some stuff to iron out though, we'll keep you posted!
That's perfect thank you :) The actual data for the project I'm working on is Visium HD with H&E so the default model should be just fine, but I could only find this Lung dataset with Visium HD as an IF image; Will have a try once I have my data ready!
I modified insert_labels()
to skip out of bounds bins when importing segmentation results. This way we get to keep the strange bins that fall outside the image as they might still be useful, e.g. when doing the GEX part of the analysis. Haven't cut a new version for now, but it should be installable from GitHub.
When confronted with the IF image, cv2 only reads its first channel, while the values of interest reside in the third. I've proposed a scaled_if_image()
that can then be used with StarDist's fluorescence model, and it's now with Nadav for testing.
And b2c.scaled_if_image()
is now on GitHub as well!
mpp = 0.35
b2c.scaled_if_image(adata, channel=2, mpp=mpp, save_path="stardist/if.tiff")
b2c.stardist(image_path="stardist/if.tiff",
labels_npz_path="stardist/if.npz",
stardist_model="2D_versatile_fluo",
prob_thresh=0.1
)
This yields a good looking segmentation:
This is the following subset:
mask = ((adata.obs["array_row"]>500) &
(adata.obs["array_row"]<530) &
(adata.obs["array_col"]>1000) &
(adata.obs["array_col"]<1030)
)
Not making a new version quite yet as I'll need to touch up the docs in a few places to account for the new addition.
And
b2c.scaled_if_image()
is now on GitHub as well!mpp = 0.35 b2c.scaled_if_image(adata, channel=2, mpp=mpp, save_path="stardist/if.tiff") b2c.stardist(image_path="stardist/if.tiff", labels_npz_path="stardist/if.npz", stardist_model="2D_versatile_fluo", prob_thresh=0.1 )
This yields a good looking segmentation:
This is the following subset:
mask = ((adata.obs["array_row"]>500) & (adata.obs["array_row"]<530) & (adata.obs["array_col"]>1000) & (adata.obs["array_col"]<1030) )
Not making a new version quite yet as I'll need to touch up the docs in a few places to account for the new addition.
You guys rock! Blazing speed as well - Will test it in a bit :)
I tried it and it seems to be much better!
With cell typist predictions:
Although I'm not sure what the workflow should look like, since everytime I do seconday labels, I always get 0?
Here's what I'm doing (might be long, apologies for that!)
mpp = 0.35
b2c.scaled_if_image(adata, channel=2,mpp=mpp, save_path="./if.tiff")
b2c.destripe(adata)
######### IF component ##########
#define a mask to easily pull out this region of the object in the future
mask = ((adata.obs["array_row"]>500) &
(adata.obs["array_row"]<530) &
(adata.obs["array_col"]>1000) &
(adata.obs["array_col"]<1030)
)
bdata = adata[mask]
b2c.stardist(image_path="./if.tiff",
labels_npz_path="./if.npz",
stardist_model="2D_versatile_fluo",
prob_thresh=0.1
)
b2c.insert_labels(adata=bdata,
labels_npz_path="./if.npz",
basis="spatial",
spatial_key="spatial_cropped",
mpp=mpp,
labels_key="labels_if"
)
bdata.obs["labels_if"] = bdata.obs["labels_if"].astype(int)
bdata = bdata[bdata.obs['labels_if']>0]
bdata.obs['labels_if'] = bdata.obs['labels_if'].astype(str)
b2c.expand_labels(bdata,
labels_key='labels_if',
expanded_labels_key="labels_if_expanded"
)
bdata.obs["labels_if_expanded"] = bdata.obs["labels_if_expanded"].astype(int)
bdata = bdata[bdata.obs['labels_if_expanded']>0]
bdata.obs['labels_if_expanded'] = bdata.obs['labels_if_expanded'].astype(str)
b2c.grid_image(bdata, "n_counts_adjusted", mpp=mpp, sigma=5, save_path="./if_gex.tiff")
b2c.stardist(image_path="./if_gex.tiff",
labels_npz_path="./if_gex.npz",
stardist_model="2D_versatile_fluo",
prob_thresh=0.05,
nms_thresh=0.5
)
b2c.insert_labels(bdata,
labels_npz_path="./if_gex.npz",
basis="array",
mpp=mpp,
labels_key="labels_if_gex"
)
bdata.obs["labels_if_gex"] = bdata.obs["labels_if_gex"].astype(int)
bdata = bdata[bdata.obs['labels_if_gex']>0]
bdata.obs['labels_if_gex'] = bdata.obs['labels_if_gex'].astype(str)
bdata.obs["labels_if_expanded"] = bdata.obs["labels_if_expanded"].astype(int)
### This one always returns 0s
b2c.salvage_secondary_labels(bdata,
primary_label="labels_if_expanded",
secondary_label="labels_if_gex",
labels_key="labels_if_joint"
)
bdata = bdata[bdata.obs['labels_if_joint']>0]
bdata.obs['labels_if_joint'] = bdata.obs['labels_if_joint'].astype(str)
cdata = b2c.bin_to_cell(bdata, labels_key="labels_if_joint", spatial_keys=["spatial", "spatial_cropped"])
cell_mask = ((cdata.obs['array_row'] >= 1450) &
(cdata.obs['array_row'] <= 1550) &
(cdata.obs['array_col'] >= 250) &
(cdata.obs['array_col'] <= 450)
)
ddata = cdata[cell_mask]
sc.pl.spatial(adata=ddata, color=["bin_count", "labels_if_joint_source"], basis="spatial_cropped")
This outputs the first image I showed;
Then, for CellTypist I'm essentially following the tutorial you guys provide (I also train a healthy model + tumor model and do as you guys do). Do we need to do all the steps to get the "bin_count" variable?
In the demo notebook, all the major operations (i.e. segmentations, label insertion etc.) happen on the main adata
that is loaded at the start. The bdata
is just generated to visualise how the various steps are doing along the way. Meanwhile in the snippet above you seem to be using the bdata
for pretty much everything once you generate it? I guess in this particular little bit of tissue the nuclei are well captured and the secondary GEX labels are not necessary.
In the demo notebook, all the major operations (i.e. segmentations, label insertion etc.) happen on the main
adata
that is loaded at the start. Thebdata
is just generated to visualise how the various steps are doing along the way. Meanwhile in the snippet above you seem to be using thebdata
for pretty much everything once you generate it? I guess in this particular little bit of tissue the nuclei are well captured and the secondary GEX labels are not necessary.
Yeah I wanted to run things just for that mask, not the entire dataset (just for quick testing)
Looks nice though! I will probably go with this approach (Bin2Cell + CellTypist) for cell type inference using a reference scRNA dataset once I get my HE dataset. Thanks for the support, looking forward to see other work on this tool! :)
I ran a simple GEX segmentation and then performed the label salvage. I'm seeing 551904 IF labels and 7747 secondary GEX-derived ones. The DAPI seems to be doing a good job of recovering the nuclei. As such it's entirely feasible that the selected region just doesn't have any.
I ran a simple GEX segmentation and then performed the label salvage. I'm seeing 551904 IF labels and 7747 secondary GEX-derived ones. The DAPI seems to be doing a good job of recovering the nuclei. As such it's entirely feasible that the selected region just doesn't have any.
100%, I think that's the reason too! I'll put things nicely in a wrapper function to run everything at once so it's easier to test, but so far so good :) Thanks again!
I ran a simple GEX segmentation and then performed the label salvage. I'm seeing 551904 IF labels and 7747 secondary GEX-derived ones. The DAPI seems to be doing a good job of recovering the nuclei. As such it's entirely feasible that the selected region just doesn't have any.
I'm actually not able to get the extra secondary labels, do you have a notebook I could have a look? I get 543621 IF labels but no secondary GEX for some reason (could be some different step I'm doing wrong)
I had to add .txt
so it would just let me attach it here.
I had to add
.txt
so it would just let me attach it here.
That's perfect, thank you very much!! :)
Looking like a pretty neat visualization :) Great tool, definitely solving something huge and very much needed with the new HD technologies! Will follow closely new releases
I'm trying to follow the notebook demos to create the bin_count metadata, and I'm getting this error on this step:
Adata has this shape:
Getting this error:
Does anyone know why this is happening?