buenrostrolab / FigR

Functional Inference of Gene Regulation
https://buenrostrolab.github.io/FigR/
MIT License
32 stars 10 forks source link

Error in pairCells function #30

Open LalicJ opened 1 year ago

LalicJ commented 1 year ago

Hi, sorry to bother you. When I run pairCells(), I got this error as follow:

pairing <- pairCells(ATAC = ATAC_PCs,
                     RNA = RNA_PCs,
                     keepUnique = TRUE)
Running geodesic pairing in after chunking data ..

Error in Loadings(object = x)[i, j, drop = drop, ...] : subscript out of bounds.

The cca components was calculated as follow:

PR_CCA <- RunCCA(object1 = PR_ATAC,
                       object2 = PR_RNA, assay1 = 'RNA',assay2 = 'RNA')

cca = PR_CCA@reductions[["cca"]]
dim(cca) # ATAC + RNA (rows), n components (columns)
#[1] 185838     50

 head(rownames(cca)) # ATAC cells
#[1] "D19D013_Fovea#_TCCAGAACATCGGCCA-1" "D19D013_Fovea#_TTGGTCCAGGCCAGTA-1"
#[3] "D19D013_Fovea#_GGATGAGTCTGCGTCT-1" "D19D013_Fovea#_TCACCACTCTCATCCG-1"
#[5] "D19D013_Fovea#_GCAGATTCAGCAACCC-1" "D19D013_Fovea#_TAATTCCTCAACGTGT-1"
tail(rownames(cca)) # RNA cells
#[1] "TTTGGTTTCATGCGGC-1-16" "TTTGGTTTCTCTATGT-1-16" "TTTGTTGAGAGAAGGT-1-16"
#[4] "TTTGTTGCACTTCAAG-1-16" "TTTGTTGGTGCGACAA-1-16" "TTTGTTGGTGTGGACA-1-16"

isATAC <- grepl("#_",rownames(cca))
table(isATAC) # RNA vs ATAC
# FALSE   TRUE
#109568  76270

ATACcells <- rownames(cca)[isATAC]
ATAC_PCs <- subset(cca, ATACcells)
RNAcells <- rownames(cca)[!isATAC]
RNA_PCs <- subset(cca, RNAcells)

head(ATAC_PCs)[1:6,1:6]
             CC_1         CC_2      CC_3       CC_4       CC_5      CC_6
NLGN4X -175.87557  -69.1071538  28.36847 -13.187110  19.029639 117.37616
MID1    -70.56709    4.4671215  60.09425 -14.378456  -6.271517  65.65311
GPM6B   102.48711 -128.8414130 -49.13831  -3.038385  84.575660 147.98845
PIR      13.09029   -0.4235617   6.99728   9.343471  25.972613  41.82880
NHS    -105.87166  -27.0766033  57.83406 -39.044979 -19.644737  36.95551
RAI2   -140.40811    2.9464300  28.50148  14.144115 -60.866979  64.11776

head(RNA_PCs)[1:6,1:6]
             CC_1         CC_2      CC_3       CC_4       CC_5      CC_6
NLGN4X -175.87557  -69.1071538  28.36847 -13.187110  19.029639 117.37616
MID1    -70.56709    4.4671215  60.09425 -14.378456  -6.271517  65.65311
GPM6B   102.48711 -128.8414130 -49.13831  -3.038385  84.575660 147.98845
PIR      13.09029   -0.4235617   6.99728   9.343471  25.972613  41.82880
NHS    -105.87166  -27.0766033  57.83406 -39.044979 -19.644737  36.95551
RAI2   -140.40811    2.9464300  28.50148  14.144115 -60.866979  64.11776

I don't know exactly what went wrong, do you have any suggestions? @vkartha Any help would be greatly appreciated.

vkartha commented 1 year ago

Hi there - it looks like something funky happened when using the subset function, such that instead of cells x components you ended up with genes x components. This shouldn't really happen since the subset function is also applied correctly from what I can tell. Can you print the initial cca output?:

cca = PR_CCA@reductions[["cca"]]
class(cca)
dim(cca)
head(cca)

From your rownames/dim run, it appears it is correct (i.e. it's a cells x components matrix with both ATAC/RNA cells as you were able to fetch the barcode names correctly).

LalicJ commented 1 year ago

Yes, I don't know why. Here is the initial cca output.

> class(cca)
[1] "DimReduc"
attr(,"package")
[1] "SeuratObject"
> dim(cca)
[1] 185838     50
> head(cca)
             CC_1         CC_2      CC_3       CC_4       CC_5      CC_6
NLGN4X -175.87557  -69.1071538  28.36847 -13.187110  19.029639 117.37616
MID1    -70.56709    4.4671215  60.09425 -14.378456  -6.271517  65.65311
GPM6B   102.48711 -128.8414130 -49.13831  -3.038385  84.575660 147.98845
PIR      13.09029   -0.4235617   6.99728   9.343471  25.972613  41.82880
NHS    -105.87166  -27.0766033  57.83406 -39.044979 -19.644737  36.95551
RAI2   -140.40811    2.9464300  28.50148  14.144115 -60.866979  64.11776
              CC_7      CC_8       CC_9      CC_10      CC_11     CC_12
NLGN4X  11.9573408 -28.86680  -4.493107 -31.855822 37.7500152 17.570856
MID1   -12.4424792 -20.81513  -7.558554  -5.200529 23.2179418 15.953092
GPM6B  -11.2129937 -45.57179  11.131815 -38.897719 43.0154007 23.310797
PIR     18.0212130 -27.52639   2.099186 -17.633395  4.6052149  8.781837
NHS      8.8137932 -38.19210   2.593103  -4.989704 28.5980414 18.571440
RAI2    -0.8604781 -12.23152 -39.227841  21.077158 -0.6035299 -1.561188
            CC_13      CC_14     CC_15      CC_16    CC_17    CC_18      CC_19
NLGN4X -18.856155 -26.507412 23.689733  -4.443494 28.05954 25.60540 20.2890424
MID1    -4.814245  -9.342132 20.570563 -11.054530 33.73003 32.01375 19.4312204
GPM6B  -16.302809  -9.071197 26.063990 -24.296506 33.62399 33.66729 27.3420750
PIR    -12.878450  -5.102027  7.449413 -26.511379 36.14112 29.98860 22.3506079
NHS    -13.323982 -14.640687 24.727332 -32.848442 37.74529 39.71971 31.2397395
RAI2    -0.659221  -8.569657 17.811945  -6.276230 16.86507 10.65270  0.1340127
           CC_20     CC_21      CC_22    CC_23     CC_24       CC_25      CC_26
NLGN4X -1.983134 27.883625 33.2232983 55.82479 -49.29908  -5.2951149 -10.365470
MID1    2.384703 29.601429 10.1729030 57.13947 -41.19563  -3.4128501  -8.639101
GPM6B  10.441748 46.110170 35.4883586 62.14384 -47.03972   0.7777975   2.551255
PIR     4.054382 38.327409 -1.8813108 56.79130 -42.78739   8.6379586   1.922499
NHS    12.243104 50.699331  0.3253091 72.52380 -52.12074 -22.4695772 -11.761002
RAI2    1.478498  8.276096 -6.5202355 28.90956 -15.77897   0.7887253  -5.934125
          CC_27       CC_28       CC_29      CC_30      CC_31       CC_32
NLGN4X 50.80880   0.8454094  -2.0199584 -24.747096   8.484102  -5.0625776
MID1   57.47852  -5.0214876 -18.8020772 -23.497848  -3.948140  -0.7944409
GPM6B  18.04253 -18.1110210   0.6712343  -7.700965 -15.250903 -13.1586061
PIR    12.86016 -29.2195386  -6.3778616 -20.628159  -3.442443   3.1870338
NHS    60.58913   2.6344999 -23.9572639 -19.400838  -3.720430  -1.0672745
RAI2   22.89196  -5.9624713  -8.3789681  -7.188969  10.579768   1.9609960
             CC_33      CC_34     CC_35      CC_36     CC_37     CC_38
NLGN4X   1.7257754 44.4451443 -9.626847 -14.831342 17.882448 -7.379775
MID1    -0.7117021  7.3527930  6.908268 -17.934797 -5.632774 14.740645
GPM6B   15.6238366 27.8860378 -1.752459 -10.296515 13.731050 20.222309
PIR    -13.5858133 -0.3524885 -5.862412   9.727464 -2.230001 13.669385
NHS     16.2295116 19.5320756 -7.337682 -17.486711 -3.432934 13.936299
RAI2    -5.9997071  7.0293173  4.645492 -13.044811 -4.813648  3.446370
            CC_39     CC_40      CC_41      CC_42      CC_43      CC_44
NLGN4X   8.211748 25.830155  20.152715 -15.079576   2.124619  28.453037
MID1     4.843137  1.390995 -24.879273 -12.840269 -14.413698 -13.548004
GPM6B  -14.490552  7.126468 -11.405305   8.985858   4.924844   7.206481
PIR     -2.397024 -6.452931  -1.682569  -6.224321  -4.457405 -23.933984
NHS      5.827346  4.785578 -38.768441  -4.727424 -30.508757 -23.290311
RAI2    12.099638 10.137314 -16.286952  -2.858716  -3.472363  -5.400907
            CC_45       CC_46     CC_47      CC_48     CC_49      CC_50
NLGN4X -6.9342924  10.4382923 -4.256122 -13.224380 22.845974  -7.615940
MID1    0.7470869  -3.2981609 -7.790328  -5.986730  7.282175  -2.038706
GPM6B  16.6984587   1.8428580 10.249337   7.522833 16.213984 -21.863452
PIR    12.5656339   0.1864377 -1.824583 -10.163720 14.170049 -17.627980
NHS    13.9001566 -26.7620663  8.517330  34.915100 20.970032 -51.066742
RAI2    1.2253744  -4.4216735 -1.060189  11.693987 -1.489973 -18.912411
LalicJ commented 1 year ago

Sorry for this question, I've solved this problem. I used cca <- as.data.frame(Embeddings(PR_CCA, reduction = "cca")) to extract cca components. Now, this error no longer appears. Error in Loadings(object = x)[i, j, drop = drop, ...] : subscript out of bounds.

But I met a new error as follow:

pairing <- pairCells(ATAC = ATAC_PCs,
                     RNA = RNA_PCs,
                     keepUnique = TRUE)
Running geodesic pairing in after chunking data ..
Number of cells in bigger dataset:  17737
Number of cells in smaller dataset:  17327
Difference in cell count between 2 datasets:  410
Chunking larger dataset to match smaller datset ..
Chunk size n =  17327  cells
Total number of chunks:  2

Chunk #  1
No. cells in chunk:  17327

Attaching package: ‘igraph’

The following object is masked from ‘package:GenomicRanges’:

    union

The following object is masked from ‘package:IRanges’:

    union

The following object is masked from ‘package:S4Vectors’:

    union

The following objects are masked from ‘package:BiocGenerics’:

    normalize, path, union

The following objects are masked from ‘package:dplyr’:

    as_data_frame, groups, union

The following objects are masked from ‘package:stats’:

    decompose, spectrum

The following object is masked from ‘package:base’:

    union

Attaching package: ‘FNN’

The following object is masked from ‘package:igraph’:

    knn

Attaching package: ‘pracma’

The following objects are masked from ‘package:Matrix’:

    expm, lu, tril, triu

Constructing KNN graph for computing geodesic distance ..
Computing graph-based geodesic distance ..
# KNN subgraphs detected:
 3
Skipping subgraphs with either ATAC/RNA cells fewer than:  50  ..
Pairing cells for subgraph No. 1
Total ATAC cells in subgraph:  17211
Total RNA cells in subgraph:  16816
Subgraph size:  16816
Search threshold being used:  6727

Error in distmat(subgraph_ATAC_pcs, subgraph_RNA_pcs) : X and Y must be numeric vectors or matrices.

LalicJ commented 12 months ago

@vkartha Sorry to bother you, but do you have any suggestions for this error as above?

Sorry for this question, I've solved this problem. I used cca <- as.data.frame(Embeddings(PR_CCA, reduction = "cca")) to extract cca components. Now, this error no longer appears. Error in Loadings(object = x)[i, j, drop = drop, ...] : subscript out of bounds.

But I met a new error as follow:

pairing <- pairCells(ATAC = ATAC_PCs,
                     RNA = RNA_PCs,
                     keepUnique = TRUE)
Running geodesic pairing in after chunking data ..
Number of cells in bigger dataset:  17737
Number of cells in smaller dataset:  17327
Difference in cell count between 2 datasets:  410
Chunking larger dataset to match smaller datset ..
Chunk size n =  17327  cells
Total number of chunks:  2

Chunk #  1
No. cells in chunk:  17327

Attaching package: ‘igraph’

The following object is masked from ‘package:GenomicRanges’:

    union

The following object is masked from ‘package:IRanges’:

    union

The following object is masked from ‘package:S4Vectors’:

    union

The following objects are masked from ‘package:BiocGenerics’:

    normalize, path, union

The following objects are masked from ‘package:dplyr’:

    as_data_frame, groups, union

The following objects are masked from ‘package:stats’:

    decompose, spectrum

The following object is masked from ‘package:base’:

    union

Attaching package: ‘FNN’

The following object is masked from ‘package:igraph’:

    knn

Attaching package: ‘pracma’

The following objects are masked from ‘package:Matrix’:

    expm, lu, tril, triu

Constructing KNN graph for computing geodesic distance ..
Computing graph-based geodesic distance ..
# KNN subgraphs detected:
 3
Skipping subgraphs with either ATAC/RNA cells fewer than:  50  ..
Pairing cells for subgraph No. 1
Total ATAC cells in subgraph:  17211
Total RNA cells in subgraph:  16816
Subgraph size:  16816
Search threshold being used:  6727

Error in distmat(subgraph_ATAC_pcs, subgraph_RNA_pcs) : X and Y must be numeric vectors or matrices.