buenrostrolab / FigR

Functional Inference of Gene Regulation
https://buenrostrolab.github.io/FigR/
MIT License
33 stars 10 forks source link

stuck on "runGenePeakcorr" #46

Open ytrink opened 2 months ago

ytrink commented 2 months ago

Hi, I am running the tutorial. I have been stuck on the function runGenePeakcorr for several days.

This is the output.

Constructing KNN graph for computing geodesic distance .. Computing graph-based geodesic distance ..

KNN subgraphs detected:

6 Skipping subgraphs with either ATAC/RNA cells fewer than: 50 .. Pairing cells for subgraph No. 1 Total ATAC cells in subgraph: 2049 Total RNA cells in subgraph: 2281 Subgraph size: 2049 Search threshold being used: 820 [1] "Constructing KNN based on geodesic distance to reduce search pairing search space" [1] "Number of cells being paired: 2049 ATAC and 2049 RNA cells" Determing pairs through optimized bipartite matching .. Assembling pair list .. Finished! Pairing cells for subgraph No. 2 Total ATAC cells in subgraph: 212 Total RNA cells in subgraph: 209 Subgraph size: 209 Search threshold being used: 84 [1] "Constructing KNN based on geodesic distance to reduce search pairing search space" [1] "Number of cells being paired: 209 ATAC and 209 RNA cells" Determing pairs through optimized bipartite matching .. Assembling pair list .. Finished! Pairing cells for subgraph No. 3 Total ATAC cells in subgraph: 1014 Total RNA cells in subgraph: 760 Subgraph size: 760 Search threshold being used: 304 [1] "Constructing KNN based on geodesic distance to reduce search pairing search space" [1] "Number of cells being paired: 760 ATAC and 760 RNA cells" Determing pairs through optimized bipartite matching ..


It is a relatively small number of cells - I want to test before trying on my own dataset which is much larger. Do you have any advice on how to speed this up? Thanks Yaron

vkartha commented 1 month ago

Hi there - sorry for the late response here. How many cells do you have in the full dataset? We have run this on ~ 70,000 cells so this technically shouldn't get stuck in anyway, and should also work for the fewer cells you've tested this on. However, finding an optimal solution (done under the hood by the optmatch solver) is dependent on the subgraph size and search range (if the search range is too big, it will take a really long time unnecessarily) so that might be something to modify (but I would rather modify this on the complete set to see if it can run rather than the toy subset). The way you would do this is by specifying search_range parameter, which by default is set to 0.2 (20% of the cells in each subgraph). I would add this to your pairCells call, and try reducing it to 0.1 or 0.05 to see if it solves the pairing faster. You can try this on the full cell set

ytrink commented 1 month ago

Hi @vkartha Thanks for the response. I tried running the tutorial with different values for search_range. Anything above 0.1 the program gets stuck as above. Anything below matches cannot be found. I get the following error:

Constructing KNN graph for computing geodesic distance .. Computing graph-based geodesic distance .. KNN subgraphs detected: 6 Skipping subgraphs with either ATAC/RNA cells fewer than: 50 .. Pairing cells for subgraph No. 1 Total ATAC cells in subgraph: 2049 Total RNA cells in subgraph: 2281 Subgraph size: 2049 Search threshold being used: 410 [1] "Constructing KNN based on geodesic distance to reduce search pairing search space" [1] "Number of cells being paired: 2049 ATAC and 2049 RNA cells" Determing pairs through optimized bipartite matching .. Assembling pair list .. Finished! Pairing cells for subgraph No. 2 Total ATAC cells in subgraph: 212 Total RNA cells in subgraph: 209 Subgraph size: 209 Search threshold being used: 42 [1] "Constructing KNN based on geodesic distance to reduce search pairing search space" [1] "Number of cells being paired: 209 ATAC and 209 RNA cells" Determing pairs through optimized bipartite matching .. Assembling pair list .. Finished! Pairing cells for subgraph No. 3 Total ATAC cells in subgraph: 1014 Total RNA cells in subgraph: 760 Subgraph size: 760 Search threshold being used: 152 [1] "Constructing KNN based on geodesic distance to reduce search pairing search space" [1] "Number of cells being paired: 760 ATAC and 760 RNA cells" Determing pairs through optimized bipartite matching .. Assembling pair list .. Finished! Pairing cells for subgraph No. 4 Total ATAC cells in subgraph: 87 Total RNA cells in subgraph: 74 Subgraph size: 74 Search threshold being used: 15 [1] "Constructing KNN based on geodesic distance to reduce search pairing search space" [1] "Number of cells being paired: 74 ATAC and 74 RNA cells" Determing pairs through optimized bipartite matching .. Error in get_pair_list(cell_matches, rownames(subgraph_ATAC_pcs), rownames(subgraph_RNA_pcs)) : Matches could not be found .. Perhaps try adjusting the constraints to allow optimal matching to be solved? Calls: pairCells -> cell_pairing -> get_pair_list