BiomedicalMachineLearning / stLearn

A novel machine learning pipeline to analyse spatial transcriptomics data
Other
176 stars 23 forks source link

min_spots vs. the number of total spots in many different lib-size 10x Visium samples #283

Open cjhong opened 4 months ago

cjhong commented 4 months ago

Thank you for providing such a great spatial transcriptome analysis tool! I have recently been using the stlearn. I am curious about the parameter "min_spot" in the tl.cci.run() From the source code, it says

min_spots: int Minimum number of spots with an LR score for an LR to be considered for further testing. To make sure how this parameter is used in the pipeline, I traced the source code and it seems there are no number of total spots to apply to lr_score but in cci/analysis.py

Found that

Calculating the lr_scores across spots for the inputted lrs

lr_scores, lrs = get_lrs_scores(adata, lrs, neighbours, het_vals, min_expr)
lr_bool = (lr_scores > 0).sum(axis=0) > min_spots
lrs = lrs[lr_bool]
lr_scores = lr_scores[:, lr_bool]

In the tutorial, 20 was set to this parameter, and 10 in the program default value. For cell-type LR analysis in the tutorial, 3 was used in cci_run().

On my side, I have 50 10x visium samples where the number of spots ranges from 100 to 4500.

The min_spots sounds like the downstream analysis handles the ligand/receptors when the number of spots>min_spots. I am not sure if it is a good idea to use the same value of min_spots for all 50 samples.

Let me get your advice.

Thank you!

BradBalderson commented 3 months ago

I am curious about the reason for the very high variability in the number of spots between your samples, is it because of having multiple tissue samples on a slide on some cases, and only one tissue sample in others? Or higher surface area of single tissues?

Just thinking the correct answer may depend on what the source of the variability for the different number of spots between samples