Long computation time - Githubissues

Hi,

I am analyzing a scRNA-seq dataset with LARRY barcodes composed of ~40k cells from three times points (T0, T1 and T2) with CoSpar.

Transition map inference with infer_Tmap_from_multitime_clones is running for more than one week now.

Is this something expected considering this number of cells/times?
How can I speed-up analysis? I have access to a HPC.
My dataset is transcriptionally heterogeneous. Can this have a negative impact on the calculation time by slowing down optimization process?

Here is the output so far:

------Compute the full Similarity matrix if necessary------
--> Compute similarity matrix: computing new; beta=0.1
computing neighbors
    using 'X_pca' with n_pcs = 40
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:08)
Smooth round: 1
--> Time elapsed: 0.7070024013519287
Smooth round: 2
--> Time elapsed: 3.1434545516967773
Smooth round: 3
--> Time elapsed: 47.9256865978241
--> Orignal sparsity=0.2689515813016907, Thresholding
--> Final sparsity=0.10062146256076271
similarity matrix truncated (Smooth round=3):  41.443312883377075
Smooth round: 4
--> Time elapsed: 113.89515376091003
--> Orignal sparsity=0.5739761021177003, Thresholding
--> Final sparsity=0.2296323676886241
similarity matrix truncated (Smooth round=4):  40.59416961669922
Smooth round: 5
--> Time elapsed: 158.23674654960632
--> Orignal sparsity=0.7449764044939262, Thresholding
--> Final sparsity=0.394037109210076
similarity matrix truncated (Smooth round=5):  44.37885856628418
--> Save the matrix at every 5 rounds
Smooth round: 6
--> Time elapsed: 320.0895211696625
--> Orignal sparsity=0.8438357182885877, Thresholding
--> Final sparsity=0.5561186782807795
similarity matrix truncated (Smooth round=6):  48.09257388114929
Smooth round: 7
--> Time elapsed: 298.9822506904602
--> Orignal sparsity=0.903221191170783, Thresholding
--> Final sparsity=0.6952270512513499
similarity matrix truncated (Smooth round=7):  48.33195114135742
Smooth round: 8
--> Time elapsed: 329.3474304676056
--> Orignal sparsity=0.9379274363082518, Thresholding
--> Final sparsity=0.7951572440994282
similarity matrix truncated (Smooth round=8):  45.52135133743286
Smooth round: 9
--> Time elapsed: 349.8222985267639
--> Orignal sparsity=0.9569717844888093, Thresholding
--> Final sparsity=0.8595253243552139
similarity matrix truncated (Smooth round=9):  46.918509006500244
Smooth round: 10
--> Time elapsed: 365.12111496925354
--> Orignal sparsity=0.9677025253102918, Thresholding
--> Final sparsity=0.8983767964685581
similarity matrix truncated (Smooth round=10):  44.01810026168823
--> Save the matrix at every 5 rounds
Smooth round: 11
--> Time elapsed: 385.83108925819397
--> Orignal sparsity=0.9742152450262845, Thresholding
--> Final sparsity=0.9215355438339919
similarity matrix truncated (Smooth round=11):  44.59958076477051
Smooth round: 12
--> Time elapsed: 379.00236654281616
--> Orignal sparsity=0.9785144106705874, Thresholding
--> Final sparsity=0.9358738337038306
similarity matrix truncated (Smooth round=12):  42.675896406173706
Smooth round: 13
--> Time elapsed: 379.267174243927
--> Orignal sparsity=0.9815413499688425, Thresholding
--> Final sparsity=0.9454461730431096
similarity matrix truncated (Smooth round=13):  43.95465707778931
Smooth round: 14
--> Time elapsed: 376.71188831329346
--> Orignal sparsity=0.9837796340626312, Thresholding
--> Final sparsity=0.952335418433685
similarity matrix truncated (Smooth round=14):  42.95647954940796
Smooth round: 15
--> Time elapsed: 377.63587737083435
--> Orignal sparsity=0.9854957397947618, Thresholding
--> Final sparsity=0.9576108088200888
similarity matrix truncated (Smooth round=15):  44.160155057907104
--> Save the matrix at every 5 rounds
----Infer transition map between neighboring time points-----
Step 1: Select time points
--> Clonal cell fraction (day T0-T1): 0.998391288206905
--> Clonal cell fraction (day T1-T2): 0.999024902490249
--> Clonal cell fraction (day T1-T0): 0.9975997599759976
--> Clonal cell fraction (day T2-T1): 0.998592568763069
--> Numer of cells that are clonally related -- day T0: 8068  and day T1: 13300
--> Numer of cells that are clonally related -- day T1: 13319  and day T2: 24833
Number of multi-time clones post selection: 99
Cell number=46225, Clone number=99
--> clonal_cell_id_t1: 21387
--> Tmap_cell_id_t1: 21387
Step 2: Optimize the transition map recursively
Load pre-computed similarity matrix
--> Load from hard disk--------
--> Compute similarity matrix: load existing data
--> Time elapsed:  9.888801336288452
--> Time elapsed:  24.041647911071777
--> Compute similarity matrix: load existing data
--> Time elapsed:  9.365047693252563
--> Time elapsed:  23.16232943534851
--> Compute similarity matrix: load existing data
--> Time elapsed:  4.958567142486572
--> Time elapsed:  13.653564691543579
Iteration 1, Use smooth_round=15
--> Clone normalization
--> Relative time point pair index: 0
--> Clone id: 0
--> Relative time point pair index: 1
--> Clone id: 0
--> Start to smooth the refined clonal map
--> Phase I: time elapsed --  3655.3733134269714
--> Phase II: time elapsed --  258255.4996459484
Iteration 2, Use smooth_round=10
--> Clone normalization
--> Relative time point pair index: 0
--> Clone id: 0
--> Relative time point pair index: 1
--> Clone id: 0
--> Start to smooth the refined clonal map
--> Phase I: time elapsed --  1105.2584946155548
--> Phase II: time elapsed --  267428.1649506092
Iteration 3, Use smooth_round=5
--> Clone normalization
--> Relative time point pair index: 0
--> Clone id: 0
--> Relative time point pair index: 1
--> Clone id: 0
--> Start to smooth the refined clonal map
--> Phase I: time elapsed --  426.9518623352051

Thank you for your help and this tool.

Best,

Hi, it really should not take that long. The LARRY dataset analyzed in Cosar paper has also around 50K cells, it was finished in about 2 hours. I guess the issue might be memory bottle neck. The computation involves large matrix, and could consume very large memory, up to 250 G in your case. So, you may try to run it on HPC.

Alternatively, you can remove cells without LARRY barcodes in your dataset, (since your data is large enough, you may not need so many cells), this reduces the cell number and memory requirements significantly. I will do this first, to make sure that you get a result first. Then go back to running all cells.

If you do so, please re-run everything in a separate folder, otherwise the pre-saved similarity matrix for the full dataset could interfere with the computation for a sub-sampled dataset.

Let me know how it goes. Happy to help more.

Get Outlook for iOShttps://aka.ms/o0ukef

From: PaulArthurM @.> Sent: Monday, October 17, 2022 6:54:31 AM To: AllonKleinLab/cospar @.> Cc: Subscribed @.***> Subject: [AllonKleinLab/cospar] Long computation time (Issue #21)

Hi,

I am analyzing a scRNA-seq dataset with LARRY barcodes composed of ~40k cells from three times points (T0, T1 and T2) with CoSpar.

Transition map inference with infer_Tmap_from_multitime_clones is running for more than one week now.

Is this something expected considering this number of cells/times?
How can I speed-up analysis? I have access to a HPC.
My dataset is transcriptionally heterogeneous. Can this have a negative impact on the calculation time by slowing down optimization process?

Here is the output so far:

------Compute the full Similarity matrix if necessary------ --> Compute similarity matrix: computing new; beta=0.1 computing neighbors using 'X_pca' with n_pcs = 40 finished: added to .uns['neighbors'] .obsp['distances'], distances for each pair of neighbors .obsp['connectivities'], weighted adjacency matrix (0:00:08) Smooth round: 1 --> Time elapsed: 0.7070024013519287 Smooth round: 2 --> Time elapsed: 3.1434545516967773 Smooth round: 3 --> Time elapsed: 47.9256865978241 --> Orignal sparsity=0.2689515813016907, Thresholding --> Final sparsity=0.10062146256076271 similarity matrix truncated (Smooth round=3): 41.443312883377075 Smooth round: 4 --> Time elapsed: 113.89515376091003 --> Orignal sparsity=0.5739761021177003, Thresholding --> Final sparsity=0.2296323676886241 similarity matrix truncated (Smooth round=4): 40.59416961669922 Smooth round: 5 --> Time elapsed: 158.23674654960632 --> Orignal sparsity=0.7449764044939262, Thresholding --> Final sparsity=0.394037109210076 similarity matrix truncated (Smooth round=5): 44.37885856628418 --> Save the matrix at every 5 rounds Smooth round: 6 --> Time elapsed: 320.0895211696625 --> Orignal sparsity=0.8438357182885877, Thresholding --> Final sparsity=0.5561186782807795 similarity matrix truncated (Smooth round=6): 48.09257388114929 Smooth round: 7 --> Time elapsed: 298.9822506904602 --> Orignal sparsity=0.903221191170783, Thresholding --> Final sparsity=0.6952270512513499 similarity matrix truncated (Smooth round=7): 48.33195114135742 Smooth round: 8 --> Time elapsed: 329.3474304676056 --> Orignal sparsity=0.9379274363082518, Thresholding --> Final sparsity=0.7951572440994282 similarity matrix truncated (Smooth round=8): 45.52135133743286 Smooth round: 9 --> Time elapsed: 349.8222985267639 --> Orignal sparsity=0.9569717844888093, Thresholding --> Final sparsity=0.8595253243552139 similarity matrix truncated (Smooth round=9): 46.918509006500244 Smooth round: 10 --> Time elapsed: 365.12111496925354 --> Orignal sparsity=0.9677025253102918, Thresholding --> Final sparsity=0.8983767964685581 similarity matrix truncated (Smooth round=10): 44.01810026168823 --> Save the matrix at every 5 rounds Smooth round: 11 --> Time elapsed: 385.83108925819397 --> Orignal sparsity=0.9742152450262845, Thresholding --> Final sparsity=0.9215355438339919 similarity matrix truncated (Smooth round=11): 44.59958076477051 Smooth round: 12 --> Time elapsed: 379.00236654281616 --> Orignal sparsity=0.9785144106705874, Thresholding --> Final sparsity=0.9358738337038306 similarity matrix truncated (Smooth round=12): 42.675896406173706 Smooth round: 13 --> Time elapsed: 379.267174243927 --> Orignal sparsity=0.9815413499688425, Thresholding --> Final sparsity=0.9454461730431096 similarity matrix truncated (Smooth round=13): 43.95465707778931 Smooth round: 14 --> Time elapsed: 376.71188831329346 --> Orignal sparsity=0.9837796340626312, Thresholding --> Final sparsity=0.952335418433685 similarity matrix truncated (Smooth round=14): 42.95647954940796 Smooth round: 15 --> Time elapsed: 377.63587737083435 --> Orignal sparsity=0.9854957397947618, Thresholding --> Final sparsity=0.9576108088200888 similarity matrix truncated (Smooth round=15): 44.160155057907104 --> Save the matrix at every 5 rounds ----Infer transition map between neighboring time points----- Step 1: Select time points --> Clonal cell fraction (day T0-T1): 0.998391288206905 --> Clonal cell fraction (day T1-T2): 0.999024902490249 --> Clonal cell fraction (day T1-T0): 0.9975997599759976 --> Clonal cell fraction (day T2-T1): 0.998592568763069 --> Numer of cells that are clonally related -- day T0: 8068 and day T1: 13300 --> Numer of cells that are clonally related -- day T1: 13319 and day T2: 24833 Number of multi-time clones post selection: 99 Cell number=46225, Clone number=99 --> clonal_cell_id_t1: 21387 --> Tmap_cell_id_t1: 21387 Step 2: Optimize the transition map recursively Load pre-computed similarity matrix --> Load from hard disk-------- --> Compute similarity matrix: load existing data --> Time elapsed: 9.888801336288452 --> Time elapsed: 24.041647911071777 --> Compute similarity matrix: load existing data --> Time elapsed: 9.365047693252563 --> Time elapsed: 23.16232943534851 --> Compute similarity matrix: load existing data --> Time elapsed: 4.958567142486572 --> Time elapsed: 13.653564691543579 Iteration 1, Use smooth_round=15 --> Clone normalization --> Relative time point pair index: 0 --> Clone id: 0 --> Relative time point pair index: 1 --> Clone id: 0 --> Start to smooth the refined clonal map --> Phase I: time elapsed -- 3655.3733134269714 --> Phase II: time elapsed -- 258255.4996459484 Iteration 2, Use smooth_round=10 --> Clone normalization --> Relative time point pair index: 0 --> Clone id: 0 --> Relative time point pair index: 1 --> Clone id: 0 --> Start to smooth the refined clonal map --> Phase I: time elapsed -- 1105.2584946155548 --> Phase II: time elapsed -- 267428.1649506092 Iteration 3, Use smooth_round=5 --> Clone normalization --> Relative time point pair index: 0 --> Clone id: 0 --> Relative time point pair index: 1 --> Clone id: 0 --> Start to smooth the refined clonal map --> Phase I: time elapsed -- 426.9518623352051

Thank you for your help and this tool.

Best,

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_AllonKleinLab_cospar_issues_21&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=HV3hSxJneq48OyAO3fZXR0hE5NwTsotE_nsykGsxi-U&m=s2K7eB7btQ5mGd30rtQhplC7oFDsBEa9qvROFHnqky0u8bppGv9a6eFcAGXcF_Yz&s=zwDa8JCv9R83g09_ofgXkVjhv00K6hlIgMyvyyORS4g&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABDCASSQU26GUL7BE5GXVRTWDUV6PANCNFSM6AAAAAARG6EWGM&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=HV3hSxJneq48OyAO3fZXR0hE5NwTsotE_nsykGsxi-U&m=s2K7eB7btQ5mGd30rtQhplC7oFDsBEa9qvROFHnqky0u8bppGv9a6eFcAGXcF_Yz&s=JbQo4zWGRSBsKZoz01H526re3Imm9Vxfmac72za434o&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>

AllonKleinLab / cospar

Long computation time #21