Closed mdmanurung closed 3 years ago
Hi @MikhaelManurung
Thank you for your interest in Milo! Have you tried adding the graph to the Milo object using the buildFromAdjacency
function? This is an example with one of Seurat's toy datasets:
library(Seurat)
library(miloR)
data("pbmc_small")
pbmc_small <- FindNeighbors(pbmc_small)
pbmc_small_sce <- as.SingleCellExperiment(pbmc_small)
pbmc_small_milo <- Milo(pbmc_small_sce)
miloR::graph(pbmc_small_milo) <- miloR::graph(buildFromAdjacency(pbmc_small@graphs$RNA_snn, k=10))
Hi @emdann,
Thanks for your fast response. Actually, those are the steps that I have managed to do so far. I am stuck at the makeNhoods()
step since I don't know which reduced dim that I should use.
Regards, Mikhael
Oh I misunderstood the query, apologies for that.
This is an interesting question for application to multi-modal datasets. We recognise that the algorithm we use for search of the neighbourhood indices limits it's application to "graph-based" integration methods such as WNN. In practice, it might be that using only one of the two modalities works just fine to exclude outliers, especially if you select the modality with the highest average weight on the embedding (the RNA in this case?).
To test this, I would recommend comparing (A) neighbourhoods constructed running makeNhoods
using the PCA on RNA with the option refined=TRUE
and (B) neighbourhoods constructed running makeNhoods
with the option refined=FALSE
, which just selects random cells as neighbourhoods indices (you can use whichever dimensionality reduction, as this is not used). The refinement algorithm works to reduce the total number of neighbourhoods and to obtain a more tight distribution of neighbourhood sizes (see Suppl. Fig. 1 in our preprint). If you see that the size and number of neighbourhoods refining on the PCA from RNA is not that different from the random selection, then you might have to use another dimensionality reduction.
Alternatively, you could consider using a "latent space-based" multi-modal integration method such as MOFA+ for the differential abundance analysis.
Hope this helps, I'd be interested to hear what works best for you in this scenario.
Hi Emma,
Thanks for the suggestions. I will try out these options and get back to you.
Regards, Mikhael
Hi @MikhaelManurung Did Emma's solution resolve your problem? I'll close this issue for now, but please re-open if there is more trouble-shooting required.
Hi all,
Not sure if this is the appropriate place to follow-up on this, but in addition to makeNhoods()
, does calcNhoodDistance()
require a reduced dimension space as well? So even setting makeNhoods(refined=FALSE)
(or attempting an alternative neighborhood refinement strategy that doesn't rely on reduced dimension values), the calcNhoodDistance()
step would still be a bottleneck?
Hi @diegoalexespi,
That is correct, calcNhoodDistance()
uses the reduced dimension space to calculate distance between the k-th nearest neighbor to the nhood index cell. However, in some initial implementation of Milo we tested using other graph-based connectivity metrics for the weighted FDR procedure, which you could store in the nhoodDistance()
slot to circumvent the use of calcNhoodDistance
. We dropped the use of these metrics mostly because they were less efficient than distance to the k-th neighbor, but the SpatialFDR correction results were comparable. @MikeDMorgan might be able to give more detailed info on this.
Hi @diegoalexespi In terms of time-complexity using either edge- or vertex-weighting was still much, much slower than computing the k-th neighbour distance. At the moment we don't have any graph-distance weighting implemented (these were removed a while ago), but it is something on our development roadmap for Milo.
Thanks @emdann and @MikeDMorgan for the prompt and informative responses! I appreciate it. I'll keep an eye out on development, and I'll do some digging into graph connectivity measures in the meantime as well.
Hi! So I was intrigued with the question of utilizing this method for graphs not directly derived from a single distance matrix, such as Seurat's WNN and BBKNN. More specifically, for makeNhoods
, I wanted to devise a graph-refinement scheme that depended solely on the kNN graph and NOT any distance metric or measure.
As a possible solution, I chose to refine the random vertex selection in makeNhoods
in the following way:
I am assuming that in an induced subgraph from a kNN, the vertex with the most in-edges is similar to what would be found form Milo's "refined" algorithm.
I tried replicating the algorithm evaluations in Supplementary Fig. 1A-B, and obtained the following (the above method is distance-independent, denoted as 'independent'). I think results are comparable in this manner. Happy to hear thoughts/discuss more and/or initiate a pull request if such a feature would be beneficial.
Thanks!
Hi @diegoalexespi
Wow, thank you for contributing this great work, it looks really promising! This has definitely been a frequently requested feature, so I think it would be a useful addition to the package. I have a few comments, that I am happy to look into myself if you start a pull request:
If this flies we will probably need to add a vignette for "graph-only" neighbourhood analysis.
Thanks @emdann !
I'll do some organizing of my code before implementing a pull request (I have to prune a bunch of my commented out code and my print("testing")
statements...), but I'll try to do that sooner than later. In the meantime, for some of the comments:
Thanks!
Re-opening this so we can keep tabs on things.
Hi @diegoalexespi
I believe for the SpatialFDR correction, the latent space is needed to determine the distance of the furthest neighbor from a given nhood index, correct?
that is correct. We are looking into comparing the overlap between neighbourhoods to k-th distance for SpatialFDR.
trim
option implemented in bbknn.bbknn
only the neighbors with top connectivities are retained (so the number of neighbors for each cell is not necessarily equal to k). This is frequently used to make sure the algorithm doesn't force similarity to a very distant set of cells present in a single batch. What I wonder is if these dataset-specific clusters might be treated as outliers by the refinement scheme, hence not sampled, hence ignored in DA analysisAdded some comparisons in the pull request #166, but question about the following:
I believe for the SpatialFDR correction, the latent space is needed to determine the distance of the furthest neighbor from a given nhood index, correct?
that is correct. We are looking into comparing the overlap between neighbourhoods to k-th distance for SpatialFDR.
Is there a specific way that you are evaluating whether the neighborhood overlap (or other measure) compares "well" to the k-th distance for SpatialFDR? I'd like to try evaluating some measures as well, but unsure how to decide whether a measure is a good replacement for the k-th distance at the moment. Thanks!
Closing for now as the discussion has moved to the PR #166
Dear authors,
Thank you for writing this package; I really enjoyed it.
Currently, I am trying to import Seurat's WNN results into Milo. I could import the pre-computed graph, however, I am unable to figure out the
makeNhoods()
step. IIUC, the function refines the neighborhoods in reduced dimensional space. As I can only pick one reduced dimension--either PCA on RNA or ADT--this becomes a bottleneck. Do you have any suggestion?Thanks in advance.
Regards, Mikhael