Same genes in LR interaction

ZheFrench commented 1 year ago

Hi,

In LR interaction results, I have sometimes the same gene as ligand and receptor in the pair. This is also the case here , look into the table epcam_epcam : This is weird , no ?

ZheFrench commented 1 year ago

I have a second question, when using st.tl.cci.run with distance=0 for within-spot mode.

How the permutation test is done knowing I also set up a configuration where all cells are in the same cluster/cell type. Ligands and receptors values for background distribution are respectively sampled inside a spot, or using all the spot defined in the unique cluster ?

From the paper :

CCI finds L-R co-expression between neighbouring spots. Permutation tests for the enrichment of L-R pairs between two cell types compared to the random null distribution are first performed using CellPhoneDB.

And finally how is computed the LR score in this situation ?

BradBalderson commented 1 year ago

Hey @ZheFrench,

This is from the NATMI database; I think it reflects the case where the product of the gene can interact with itself.

If all cell types occur in every spot, and the proportions are the same, then this is the same as a uniform random distribution and there will be no cell type-cell type interactions that are over-represented from random.

LR score is calculated independently from cell type information; so has no effect on calling LR spatial co-expression hotspots.

Also, please note current preprint is not up to date... hopefully new one reflecting changes will be up soon

ZheFrench commented 1 year ago

Ok for NATMI. Ok annotation doesn't play any role. Still the computing of LR score remain fuzzy to me.

To be sure to understand the difference between distance=None and distance=0 detailed here. When distance is set to 0, the approach is to compute LR per spot. (No spatial information involved) When distance is set to None, it looking just to spot around ? Can you explained a little bit more ? How does it impact the result ? Tks.

Update : Also I would like to retrieve the pvalue used to filter the LRs contained in data.uns['lr_summary']. adata.obsm[p_vals] give a huge matrix of p-values for every genes-spot but how do you get one single p-value for the line of one LR in data.uns['lr_summary'].

BradBalderson commented 1 year ago

"When distance is set to 0, the approach is to compute LR per spot. (No spatial information involved)" I think there is still spatial information, since there is still different cell types which occur within a given spot (assuming Visium data). But if it's single cell spatial then yes, to me using distance=0 doesn't make much sense.

"When distance is set to None, it looking just to spot around ? Can you explained a little bit more ?" Correct; distance set to None means we internally calculate the distance to the surrounding spots based on standard Visium. You can see the source code on how we do this here: https://github.com/BiomedicalMachineLearning/stLearn/blob/2892ef5e07d7bfb6e4aa06a46c0a27ff74b9af53/stlearn/tools/microenv/cci/base.py#L64 This was written by @jon-xu, who might provide some more detail if I'm missing something :)

"How does it impact the result ?" The distance used will effect the interaction range. Different LRs might have different diffusion distances etc, so will change which LRs come up as top, and as a result different CCIs. Think this is very interesting question which needs more research; how far different LRs can go and how to incorporate into new spatial CCI methods.

"retrieve the pvalue used to filter the LRs contained in data.uns['lr_summary']" Sorry this is a little cryptic; adata.obsm[p_vals] has spots as rows, LRs as columns. Order of LRs is the same as order of LRs stored in data.uns['lr_summary'], such that if you were to convert the array to a dataframe:

lr_pvals_df = pd.DataFrame(data.obsm[pvals], index=data.obs_names.values, 
                                                columns=data.uns['lr_summary'].index.values)

Hopefully that helps @ZheFrench

ZheFrench commented 1 year ago

I reopen this because I have one more question on the same subject. Sorry to bother you :) I saw there is a use_label in st.tl.cci.run so I imagine that in this mode celltype could have an impact in the way LR are computed. But I don't understand how to set use_label correctly here and how it will impact the result.

For example If I have the file with ratio/propability of deconvolution and that I add this via add_deconvolution first. I just wanted to give a shot to see what the results are in this case. Any idea ?

    st.add.add_deconvolution(data,annotation_path=proportions.tsv)
    # This work , i can plt them with st.pl.deconvolution_plot
    st.tl.cci.run(data, lrs,use_label=?)

Also in cci.run function :

 Conduct with cell heterogeneity info if label_transfer provided #
    cell_het = type(use_label) != type(None) and use_label in adata.uns.keys()

Only keyword accepted for use_label seems to be spatial in order to pass in this function ? It doesn't seem that add_deconvolution added something. Maybe but I don't know where.

print(adata.uns.keys())
odict_keys(['spatial'])

I tested with use_label="spatial" but it raised an error. So i'm not sure it's fully implemented and this is case don't worry it doesn't matter, I can use without this mode, it's fine :) Just wondering if I could make it works properly :

Calculating cell hetereogeneity...
Traceback (most recent call last):
  File "stlearn_benchmark.1.py", line 93, in <module>
    st.tl.cci.run(data, lrs, # use_label="spatial",
  File "/data/villemin/anaconda3/envs/stlearn/lib/python3.8/site-packages/stlearn/tools/microenv/cci/analysis.py", line 307, in run
    count(adata, distance=distance, use_label=use_label, use_het=use_label)
  File "/data/villemin/anaconda3/envs/stlearn/lib/python3.8/site-packages/stlearn/tools/microenv/cci/het.py", line 77, in count
    adata.obsm[use_het] = (adata.uns[use_label] > 0.2).sum(axis=1)
TypeError: '>' not supported between instances of 'dict' and 'float'

duypham2108 commented 1 year ago

Yes, the function add_deconvolution was created for different purpose. You can just follow this code to add the label (change 'cell_type' to anything you want and set the use_label with it)

# NOTE: using the same key in data.obs & data.uns
data.obs['cell_type'] = labels # Adding the dominant cell type labels per spot
data.obs['cell_type'] = data.obs['cell_type'].astype('category')
data.uns['cell_type'] = spot_mixtures # Adding the cell type scores

BradBalderson commented 1 year ago

@ZheFrench specifying use_label in st.tl.cci.run weights the LR scores so that those in regions of higher cellular heterogeneity receive a higher score. Downstream, cell annotation information is also used in st.tl.cci.run_cci to determine what cell types interact via a given LR. I haven't extensively tested the effect of providing the cell type information at both points of analysis, but my guess is it will reduce interaction predictions in areas of the tissue with a fewer cell types (i.e. lower heterogeneity). Therefore would likely reduce cases of cells interacting with cells of the same cell type.

Just something to keep in-mind to help you with your analysis :)

ZheFrench commented 1 year ago

Thank you both ! I will give a try and close the issue after. :)

ZheFrench commented 1 year ago

Hummm got this error using code provided below.

```
    print("dominant")
    print(annot_df["cell_type"].head())
    data.obs['cell_type'] = annot_df # Adding the dominant cell type labels per spot
    data.obs['cell_type'] = data.obs['cell_type'].astype('category')
    #df2=df2.reindex(data.obs_names)
    print("spot_mixtures")
    print(spot_mixtures.head())
    data.uns['cell_type'] = spot_mixtures # Adding the cell type scores

    # Running the analysis 
    st.tl.cci.run(data, lrs, use_label="cell_type",
                      #min_spots=20, #Filter out any LR pairs with no scores for less than min_spots
                      distance=0, # None defaults to spot+immediate neighbours; distance=0 for within-spot mode
                      n_pairs=1000, # Number of random pairs to generate; low as example, recommend ~10,000
                      n_cpus=CORE, # Number of CPUs for parallel. If None, detects & use all available.
                      verbose=True)

output with error:

dominant
10x10    CAFs
10x11    CAFs
10x12    CAFs
10x13    CAFs
10x14    CAFs
Name: cell_type, dtype: object
spot_mixtures
        B-cells      CAFs  Endothelial    Epithelial   Myeloid  Plasma Cells       PVL   T-cells
10x10  0.166865  0.424140     0.061200  4.282359e-10  0.132628  8.024812e-02  0.042301  0.092618
10x11  0.101619  0.441774     0.025270  2.263477e-02  0.199885  4.011563e-10  0.052107  0.156711
10x12  0.100248  0.387403     0.017383  1.349624e-01  0.098616  1.138743e-01  0.022731  0.124782
10x13  0.083322  0.379910     0.014945  2.344817e-01  0.086662  4.508863e-02  0.040217  0.115374
10x14  0.126579  0.359950     0.027455  3.867542e-10  0.112865  1.865791e-01  0.075179  0.111393
Calculating neighbours...
0 spots with no neighbours, 1 median spot neighbours.
Spot neighbour indices stored in adata.obsm['spot_neighbours'] & adata.obsm['spot_neigh_bcs'].
Calculating cell hetereogeneity...
Counts for cluster (cell type) diversity stored into adata.uns['cell_type']
Traceback (most recent call last):
  File "stlearn_benchmark.1.py", line 93, in <module>
    st.tl.cci.run(data, lrs, use_label="cell_type",
  File "/data/villemin/anaconda3/envs/stlearn/lib/python3.8/site-packages/stlearn/tools/microenv/cci/analysis.py", line 318, in run
    lr_scores, lrs = get_lrs_scores(adata, lrs, neighbours, het_vals, min_expr)
  File "/data/villemin/anaconda3/envs/stlearn/lib/python3.8/site-packages/stlearn/tools/microenv/cci/base.py", line 131, in get_lrs_scores
    lr_scores = get_scores(
  File "/data/villemin/anaconda3/envs/stlearn/lib/python3.8/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/data/villemin/anaconda3/envs/stlearn/lib/python3.8/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at /data/villemin/anaconda3/envs/stlearn/lib/python3.8/site-packages/stlearn/tools/microenv/cci/base.py (336)

File "../../../../../../../villemin/anaconda3/envs/stlearn/lib/python3.8/site-packages/stlearn/tools/microenv/cci/base.py", line 336:
def get_scores(
    <source elided>
    """
    spot_scores = np.zeros((len(spot_indices), spot_lr1s.shape[1] // 2), np.float64)
    ^ 

This error may have been caused by the following argument(s):
- argument 3: Cannot determine Numba type of <class 'pandas.core.series.Series'>

BradBalderson commented 1 year ago

Hmm have not seen this error. I have two ideas that might help:

Can you try this to add the dominant cell types to data.obs?:

data.obs['cell_type'] = annot_df['cell_type'].values.astype(str) # Adding the dominant cell type labels per spot
data.obs['cell_type'] = data.obs['cell_type'].astype('category')

Numba version may be incorrect (like in #157), could you check if it is numba v0.53.1 ?

ZheFrench commented 1 year ago

it's 0.56.2 from the stlearn env. Should I downgrade ? How ?
/data/villemin/anaconda3/envs/stlearn/lib/python3.8/site-packages/numba

still the same error with 0.56.2 numba

        data.obs['cell_type'] = annot_df["cell_type"].values.astype(str) # Adding the dominant cell type labels per spot
        data.obs['cell_type'] = data.obs['cell_type'].astype('category')
        data.uns['cell_type'] = spot_mixtures # Adding the cell type scores
        print('Spot mixture order correct?: ',np.all(spot_mixtures.index.values==data.obs_names.values)) # it returns true

        # Running the analysis 
        st.tl.cci.run(data, lrs, use_label="cell_type",...)

BradBalderson commented 1 year ago

Ah got it, I see the issue now. Sorry have not extensively tested that part since it is an old feature from previous version, and had intended st.tl.cci.run_cci to replace adding in cell type information. Let me fix it and push to master branch.

I don't have time to fix today, but will do when I get a chance later since will do a few other code run-time optimisations on st.tl.cci.run_cci anyhow.

ZheFrench commented 1 year ago

Ok thank you , it's not urgent.Good luck with the bug fix.

BiomedicalMachineLearning / stLearn

Same genes in LR interaction #199