CamaraLab / STvEA

Spatially-resolved Transcriptomics via Epitope Anchoring
GNU General Public License v3.0
17 stars 4 forks source link

Error during 'stvea_object <- GetTransferMatrix(stvea_object)' #19

Closed BokaiZhu closed 3 years ago

BokaiZhu commented 3 years ago

Hi STvEA team,

Thanks for providing a wonderful tool! I'm interested in matching codex cells to citeseq cell using the MapCODEXtoCITE and GetTransferMatrix function. My goal is to get the NN matrix with which codex cell is matching to which citeseq cells. I run the codex with a subset of the balbc codex dataset and a murine spleen citeseq dataset:

Screen Shot 2021-08-09 at 12 16 07 PM

( I did not run the umap/clustering steps as they should not be related to matching i pressume?) I had the error Error in [<-(*tmp*, i, , value = idx) : subscript out of bounds during the GetTransferMatrix function.

Also, it seems that the corrected codex data from MapCODEXtoCITE was incomplete in my case?

Screen Shot 2021-08-09 at 12 16 42 PM

(where my input is 5000 codex cells)

Thank you again for the wonderful tool and let me know where the problem could potentially be.

govekk commented 3 years ago

Hello,

It's hard to exactly tell where this error is coming from without more info, so can you please tell me:

BokaiZhu commented 3 years ago

Hi,

Thanks for the fast reply! The output of 'traceback()' after error is :

Screen Shot 2021-08-12 at 6 38 57 PM

Where the related matrix dimensions (corrected_codex; codex_clean; stvea_object@cite_clean[,colnames(stvea_object@corrected_codex)]) are :

Screen Shot 2021-08-12 at 6 40 58 PM
govekk commented 3 years ago

Thanks. It definitely looks like the problem is in MapCODEXtoCITE() which creates the corrected_codex matrix. However, I'm having trouble replicating this issue without the same data.

If you run the code below, you should be able to get a better sense of why corrected_codex comes out the wrong size (this code chunk just manually runs the same functions as MapCODEXtoCITE()). If N is the number of anchors left after FilterAnchors(), filteredAnchors and scoredAnchors should be (N x 3) matrices, integration.matrix should be (N x 27), weights should be (N x 5000), corrected_data should be (25000 x 27), and stvea_object@corrected_codex should be (5000 x 27). It would also be good to check if any of these matrices, including the cite_clean and codex_clean, have NA values.

# Set parameters from internal functions
common_proteins <- colnames(stvea_object@cite_clean)[colnames(stvea_object@cite_clean) %in% colnames(stvea_object@codex_clean)]
ref_mat = stvea_object@cite_clean[,common_proteins]
query_mat = stvea_object@codex_clean[,common_proteins]
rna_mat = stvea_object@cite_latent
cite_index = 1
num.cc = ncol(ref_mat)-1
k.anchor = 20
k.filter=100
k.score=80
k.weight=100
verbose=TRUE

# Call the same functions MapCODEXtoCITE() calls
cca_matrix <- RunCCA(t(ref_mat), t(query_mat), standardize=TRUE, num.cc=num.cc)$ccv
neighbors <- FindNNrna(ref_emb = cca_matrix[1:nrow(ref_mat),],
                       query_emb = cca_matrix[(nrow(ref_mat)+1):nrow(cca_matrix),],
                       rna_mat = rna_mat,
                       cite_index = cite_index,
                       k=max(k.anchor, k.score), verbose=verbose)
anchors <- FindAnchorPairs(neighbors, k.anchor=k.anchor)
filteredAnchors <- FilterAnchors(ref_mat, query_mat, anchors, k.filter=k.filter, verbose=verbose)
scoredAnchors <- ScoreAnchors(neighbors, filteredAnchors, nrow(ref_mat), nrow(query_mat), k.score=k.score, verbose=verbose)
integration.matrix <- FindIntegrationMatrix(ref_mat, query_mat, neighbors, scoredAnchors, verbose=verbose)
weights <- FindWeights(neighbors, scoredAnchors, query_mat, integration.matrix, k.weight=k.weight, verbose=verbose)
corrected_data <- TransformDataMatrix(ref_mat, query_mat, integration.matrix, weights, verbose=verbose)

# corrected_data is an rbind of ref_mat (cite_clean) and corrected_codex
stvea_object@corrected_codex <- corrected_data[(nrow(ref_mat)+1):nrow(corrected_data),]
BokaiZhu commented 3 years ago

Thanks for the detailed walk-through!!

I run the code and it raised an error at step weights <- FindWeights(neighbors, scoredAnchors, query_mat, integration.matrix, k.weight=k.weight, verbose=verbose)

The error was due to the object integration.matrix was empty with dim(integration.matrix) as (0,729). I suspect it is this step causing the trouble, where the column number seems 27*27 for some reason with weird sequence of names:

Screen Shot 2021-08-19 at 4 57 35 PM

The previous steps are all good, dim(filteredAnchors) and dim(scoredAnchors) are both (23202,3).

I also checked the previous matrices and they don't seem to contain any NA or Inf values (including cite_clean and codex_clean). Thanks for the help and please let me know what I'm doing wrong, this is really bizarre.

govekk commented 3 years ago

Hello! And thank you so much for waiting so long. Those were some very odd errors for what ended up being a pretty mundane solution: the mapping code requires that the protein matrices have row names. I hadn’t realized this before, and will either add code to automatically add them or at least fail a little more nicely. For now though, you can fix these errors by setting unique row names in your data matrices (codex_protein, cite_protein, cite_latent, etc).

BokaiZhu commented 3 years ago

Thank you for the update! It has been a while and let me take some time to rewind the analysis. I will close the issue for now and update late later. Thanks again for the help.