gerstung-lab / MutationTimeR

An R package to time somatic mutations
60 stars 24 forks source link

CNV input file format when there are sub clonal CNV segments #16

Closed shaghayeghsoudi closed 3 years ago

shaghayeghsoudi commented 3 years ago

Hi developers, I have started running mutationtimeR on a large samples. In the beginning I limited my CNV file only to clonal segments and everything worked out quite well. However, later I noticed subclonal CNVs also can be considered. I have faced with a problem running MutationTimeR when I include sub-clonal segments. I made my input based on your suggestion in this issue: https://github.com/gerstung-lab/MutationTimeR/issues/5

GRanges object with 6 ranges and 6 metadata columns:
      seqnames              ranges strand | major_cn1 minor_cn1
         <Rle>           <IRanges>  <Rle> | <integer> <integer>
  [1]        1    776546-121345296      * |         1         1
  [2]        1 145376198-249201480      * |         2         1
  [3]        2     18506-225370005      * |         1         1
  [4]        2 225372562-225434445      * |      <NA>      <NA>
  [5]        2 225439021-242978945      * |         1         1
  [6]        3      60596-90058295      * |         2         1
      clonal_frequency1 major_cn2 minor_cn2 clonal_frequency2
              <numeric> <integer> <integer>         <numeric>
  [1]           0.95285      <NA>      <NA>                NA
  [2]           0.95285      <NA>      <NA>                NA
  [3]           0.95285      <NA>      <NA>                NA
  [4]           0.95285      <NA>      <NA>                NA
  [5]           0.95285      <NA>      <NA>                NA
  [6]           0.95285      <NA>      <NA>                NA

So I have two entries CN1 and CN2 and their clonal frequencies. But I get an error! Can you provide an example how the input should look like when you have sub clonal segments as well?

christopherwirth commented 3 years ago

Hi,

I'm not one of the developers of MutationTimeR but am also trying to use it. My understanding, based on what Moritz said in #5 (as you mentioned) is:

For subclonal segments, there should be two rows in the bb GRanges object (rather than having extra columns for the different copy numbers)

e.g. in this case, rows 16 and 17 in my bb GRanges object represent the different clones of a single subclonal segment:

> bb[16:17]
GRanges object with 5 ranges and 3 metadata columns:
      seqnames            ranges strand |  major_cn  minor_cn clonal_frequency
         <Rle>         <IRanges>  <Rle> | <numeric> <numeric>        <numeric>
  [1]        6   809179-12334634      * |         1         1             0.22
  [2]        6   809179-12334634      * |         2         1             0.21

In any case, doing it this way does seem to allow it to run without causing errors.

Hope that helps (and if I'm doing it wrong, anybody please correct me!)

shaghayeghsoudi commented 3 years ago

Thanks a lot christopherwirth, I agree with you, should be this way not both in the same row. Thank you