friendsofstrandseq / mosaicatcher-pipeline

Integrated workflow for SV calling from single-cell Strand-seq data
https://friendsofstrandseq.github.io/mosaicatcher-docs/
MIT License
21 stars 10 forks source link

[Bug]: Single segment chromosomes break calculation of to/from values for BPS in plot-sv-calls_dev.R #40

Closed p-smirnov closed 1 year ago

p-smirnov commented 1 year ago

Contact Details

petr.smirnov@embl.de

What happened?

In the following line: https://github.com/friendsofstrandseq/mosaicatcher-pipeline/blob/3422078530b95fb2b403bfa22d164f5892bedff1/workflow/scripts/plotting/plot-sv-calls_dev.R#L375

Suppose you have a seg object where a chromosome has a single segment, for example (truncated to remove the chromosomes not affected by issue):

   chrom  k bps     N
 1: chr21  7  49  7974
 2: chr21  7  71  7974
 3: chr21  7  74  7974
 4: chr21  7  88  7974
 5: chr21  7  91  7974
 6: chr21  7 204  7974
 7: chr21  7 233  7974
 8: chr22  1 254  8208
 9:  chrX 55   6 14385
10:  chrX 55  25 14385
11:  chrX 55  27 14385
12:  chrX 55  33 14385
13:  chrX 55  35 14385
14:  chrX 55  39 14385
15:  chrX 55  47 14385
16:  chrX 55  49 14385
17:  chrX 55  52 14385
18:  chrX 55  83 14385
19:  chrX 55  85 14385
20:  chrX 55  86 14385
21:  chrX 55  87 14385
22:  chrX 55 161 14385
23:  chrX 55 172 14385
24:  chrX 55 174 14385
25:  chrX 55 176 14385
26:  chrX 55 212 14385
27:  chrX 55 213 14385
28:  chrX 55 215 14385
29:  chrX 55 224 14385
30:  chrX 55 230 14385
31:  chrX 55 231 14385
32:  chrX 55 244 14385
33:  chrX 55 247 14385
34:  chrX 55 261 14385
35:  chrX 55 268 14385
36:  chrX 55 365 14385
37:  chrX 55 366 14385
38:  chrX 55 373 14385
39:  chrX 55 400 14385
40:  chrX 55 401 14385
41:  chrX 55 409 14385
42:  chrX 55 410 14385
43:  chrX 55 443 14385
44:  chrX 55 454 14385
45:  chrX 55 455 14385
46:  chrX 55 556 14385
47:  chrX 55 557 14385
48:  chrX 55 641 14385
49:  chrX 55 642 14385
50:  chrX 55 645 14385
51:  chrX 55 685 14385
52:  chrX 55 700 14385
53:  chrX 55 701 14385
54:  chrX 55 702 14385
55:  chrX 55 703 14385
56:  chrX 55 705 14385
57:  chrX 55 713 14385
58:  chrX 55 718 14385
59:  chrX 55 725 14385
60:  chrX 55 747 14385
61:  chrX 55 748 14385
62:  chrX 55 749 14385
63:  chrX 55 780 14385
    chrom  k bps     N

Then, the code in the line referenced above will only populate the from/to columns until chr22, afterwards, they will be populated with NA, as so:

   chrom  k bps     N from  to
 1: chr21  7  49  7974    1  49
 2: chr21  7  71  7974   50  71
 3: chr21  7  74  7974   72  74
 4: chr21  7  88  7974   75  88
 5: chr21  7  91  7974   89  91
 6: chr21  7 204  7974   92 204
 7: chr21  7 233  7974  205 233
 8: chr22  1 254  8208   NA  NA
 9:  chrX 55   6 14385   NA  NA
10:  chrX 55  25 14385   NA  NA
11:  chrX 55  27 14385   NA  NA
12:  chrX 55  33 14385   NA  NA
13:  chrX 55  35 14385   NA  NA
14:  chrX 55  39 14385   NA  NA
15:  chrX 55  47 14385   NA  NA
16:  chrX 55  49 14385   NA  NA
17:  chrX 55  52 14385   NA  NA
18:  chrX 55  83 14385   NA  NA
19:  chrX 55  85 14385   NA  NA
20:  chrX 55  86 14385   NA  NA
21:  chrX 55  87 14385   NA  NA
22:  chrX 55 161 14385   NA  NA
23:  chrX 55 172 14385   NA  NA
24:  chrX 55 174 14385   NA  NA
25:  chrX 55 176 14385   NA  NA
26:  chrX 55 212 14385   NA  NA
27:  chrX 55 213 14385   NA  NA
28:  chrX 55 215 14385   NA  NA
29:  chrX 55 224 14385   NA  NA
30:  chrX 55 230 14385   NA  NA
31:  chrX 55 231 14385   NA  NA
32:  chrX 55 244 14385   NA  NA
33:  chrX 55 247 14385   NA  NA
34:  chrX 55 261 14385   NA  NA
35:  chrX 55 268 14385   NA  NA
36:  chrX 55 365 14385   NA  NA
37:  chrX 55 366 14385   NA  NA
38:  chrX 55 373 14385   NA  NA
39:  chrX 55 400 14385   NA  NA
40:  chrX 55 401 14385   NA  NA
41:  chrX 55 409 14385   NA  NA
42:  chrX 55 410 14385   NA  NA
43:  chrX 55 443 14385   NA  NA
44:  chrX 55 454 14385   NA  NA
45:  chrX 55 455 14385   NA  NA
46:  chrX 55 556 14385   NA  NA
47:  chrX 55 557 14385   NA  NA
48:  chrX 55 641 14385   NA  NA
49:  chrX 55 642 14385   NA  NA
50:  chrX 55 645 14385   NA  NA
51:  chrX 55 685 14385   NA  NA
52:  chrX 55 700 14385   NA  NA
53:  chrX 55 701 14385   NA  NA
54:  chrX 55 702 14385   NA  NA
55:  chrX 55 703 14385   NA  NA
56:  chrX 55 705 14385   NA  NA
57:  chrX 55 713 14385   NA  NA
58:  chrX 55 718 14385   NA  NA
59:  chrX 55 725 14385   NA  NA
60:  chrX 55 747 14385   NA  NA
61:  chrX 55 748 14385   NA  NA
62:  chrX 55 749 14385   NA  NA
63:  chrX 55 780 14385   NA  NA
    chrom  k bps     N from  to

This is because bps[1:(.N - 1)] will return bps[1:0] for chr22, as so:

> seg[chrom=="chr22", .(from = c(1, bps[1:(.N-1)] + 1), to = bps)]

   from  to
1:    1 254
2:  255 254

Proposed fix is to replace bps[1:(.N - 1)] with bps[-length(bps)]:

> seg[chrom=="chr22", .(from = c(1, bps[-length(bps)] + 1), to = bps)]

   from  to
1:    1 254

Steps To Reproduce

I am using R 4.2.2

  1. Load data.table package

  2. Run this to create example object:

seg <- 
structure(list(chrom = c("chr21", "chr21", "chr21", "chr21",
"chr21", "chr21", "chr21", "chr22", "chrX", "chrX", "chrX", "chrX",
"chrX", "chrX", "chrX", "chrX", "chrX", "chrX", "chrX", "chrX",
"chrX", "chrX", "chrX", "chrX", "chrX", "chrX", "chrX", "chrX",
"chrX", "chrX", "chrX", "chrX", "chrX", "chrX", "chrX", "chrX",
"chrX", "chrX", "chrX", "chrX", "chrX", "chrX", "chrX", "chrX",
"chrX", "chrX", "chrX", "chrX", "chrX", "chrX", "chrX", "chrX",
"chrX", "chrX", "chrX", "chrX", "chrX", "chrX", "chrX", "chrX",
"chrX", "chrX", "chrX"), k = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 1L,
55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L,
55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L,
55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L,
55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L,
55L, 55L, 55L), bps = c(49L, 71L, 74L, 88L, 91L, 204L, 233L,
254L, 6L, 25L, 27L, 33L, 35L, 39L, 47L, 49L, 52L, 83L, 85L, 86L,
87L, 161L, 172L, 174L, 176L, 212L, 213L, 215L, 224L, 230L, 231L,
244L, 247L, 261L, 268L, 365L, 366L, 373L, 400L, 401L, 409L, 410L,
443L, 454L, 455L, 556L, 557L, 641L, 642L, 645L, 685L, 700L, 701L,
702L, 703L, 705L, 713L, 718L, 725L, 747L, 748L, 749L, 780L),
    N = c(7974, 7974, 7974, 7974, 7974, 7974, 7974, 8208, 14385,
    14385, 14385, 14385, 14385, 14385, 14385, 14385, 14385, 14385,
    14385, 14385, 14385, 14385, 14385, 14385, 14385, 14385, 14385,
    14385, 14385, 14385, 14385, 14385, 14385, 14385, 14385, 14385,
    14385, 14385, 14385, 14385, 14385, 14385, 14385, 14385, 14385,
    14385, 14385, 14385, 14385, 14385, 14385, 14385, 14385, 14385,
    14385, 14385, 14385, 14385, 14385, 14385, 14385, 14385, 14385
    )), class = c("data.table",
"data.frame"), row.names = c(NA, -63L))

Mosaicatcher-pipeline Version

1.5.1 (Default)

Command used

NA

How did you run the pipeline?

Conda only

What did you use to run the pipeline? (local execution, HPC, cloud)

EMBL HPC

Pipeline configuration file

Not really relevant

Relevant log output

NA, no log output capturing this issue

Anything else?

No response