gavinha / TitanCNA

Analysis of subclonal copy number alterations (CNA) and loss of heterozygosity (LOH) in cancer
GNU General Public License v3.0
94 stars 36 forks source link

setGenomeStyle mingles the number of rows of the CN data if it contains alt contigs #76

Closed lbeltrame closed 4 years ago

lbeltrame commented 5 years ago

Example:

> library(TitanCNA)
> library(data.table)

> cnData <- fread(cnfile)
> length(cnData$chr)
[1] 218949
> test <- setGenomeStyle(cnData$chr, genomeStyle = "UCSC")
> length(test)
[1] 218919
> setdiff(cnData$chr, test)
[1] "chr1_KI270706v1_random"  "chr1_KI270711v1_random" 
[3] "chr4_GL000008v2_random"  "chr14_GL000009v2_random"
[5] "chrUn_KI270742v1"        "chr7_KI270803v1_alt"    
[7] "chr22_KI270879v1_alt"    "chr22_KI270928v1_alt"  

This can be workarounded by setting best=F when calling mapSeqlevels. I guess they should be instead dropped.

lbeltrame commented 5 years ago

The reason for this being a problem is that the genome style is immediately applied after loading the log2ratio files in runTitanCNA.R, and thus causes an error when setting the new names back to the loaded object (because they are less than before).

gavinha commented 4 years ago

Hi @lbeltrame

Thanks for bringing this up. I think the issue is now resolved with these lines of code in the main R script:

https://github.com/gavinha/TitanCNA/blob/c4f94ee10e74e83869c9585501a3dacfee3464cc/scripts/R_scripts/titanCNA.R#L184-L191