Closed ysbioinfo closed 4 years ago
Hi @snoopy-448
Thanks for bringing this up. This was brought up previously by @lbeltrame in Issue #10 I haven't gotten around to fixing this but I'll try to take a look this soon. Glad to see you were able to make a quick fix.
Best, Gavin
Gavin, I have another question about the output of TitanCNA. I want to use the output from TitanCNA to run PhyloWGS. The team of PhyloWGS write a cnv_parser.py to transform the segs.txt to the format they need, but it seems that the parser is designed for an older version of TitanCNA and some column names of the segs.txt have changed now, so their parser cannot work on the latest version of Titan.
with open(self._titan_filename) as titanf:
reader = csv.DictReader(titanf, delimiter='\t')
for record in reader:
chrom = record['Chromosome'].lower()
cnv = {}
cnv['start'] = int(record['Start_Position(bp)'])
cnv['end'] = int(record['End_Position(bp)'])
cnv['major_cn'] = int(record['MajorCN'])
cnv['minor_cn'] = int(record['MinorCN'])
clonal_freq = record['Clonal_Frequency']
if clonal_freq == 'NA':
cnv['cellular_prevalence'] = self._cellularity
else:
cnv['cellular_prevalence'] = float(clonal_freq) * self._cellularity
cn_regions[chrom].append(cnv)
Above is a piece of their parser. It's obvious the Start_Position(bp)/End_Position(bp) are changed to Start_Position.bp./End_Position.bp. now. I wonder if the 'Clonal_Frequency' is renamed as 'Cellular_Prevalence' now in Titan. Are they the same? By the way, is Cellular_Prevalence the fraction of tumor cells harboring this CNV and I need to use Cellular_Prevalence * purity to get the fraction of all cells who harbor this CNV?
Thanks!
Yang
Hi @snoopy-448
Yes, you are right. I had changed Cellular_Frequency
to Cellular_Prevalence
at some point and that might've broken their parser.
Everything about this value is the same other than the new column name.
Sorry for the inconvenience!
-Gavin
Thanks so much!
@snoopy-448 , in addition to modifying Clonal_Frequency to Cellular_Prevalence in the phylowgs parse_cnvs.py parser script, do you also modify MajorCN and MinorCN to Corrected_MajorCN and Corrected_MinorCN on lines 69-70 on that script to pull from those columns in Titan's *segs.txt?
Hi @MUppal
The Corrected_MajorCN
and Corrected_MinorCN
are additional columns included after some correction during post-processing to allow copy number to be higher than the initial max (i.e. 8) in the model. This was included in commit 96e1c5bff8cf6f2793af40c3e463c85fe6fb3986 and brought up in #63
The original MajorCN
and MinorCN
columns are still there and it's up to you whether you like to use the corrected columns instead.
Best, Gavin
Hi Gavin, I recently found a little bug to be fixed in selectSolution.R. I run TitanCNA using snakemake. I found some patients disappeared from the final optimalClusterSolution.txt but some patients appeared twice. For example, 07T disappeared but 107T appeared twice. I read the code and found the bug is in this line: phi2Samples <- grep(id, phi2Files, value=T) If my id is 07T, then grep will catch files of both 07T and 107T, and compare them together. You can imagine that in my case, 107T is always the winner in all conditions, so 07T disappeared from the optimalClusterSolution.txt and 107T appeared twice. I change this line to: phi2Samples <- grep(paste('/', id, sep = ''), phi2Files, value=T) to make sure the id is in the beginning of the filename. It solves my problem. I'm not good at coding and I think you should have a better way to fix this bug in next version of TitanCNA. Thanks again for making such a convenient pipeline!
Yang