Open ses101-24 opened 1 month ago
Thanks for the ticket, @ses101-24.
We'll review the function you mentioned for any issues. Could you please provide the input files you used for the run so we can examine it in more detail?
Thanks very much.
Sorry, but I am not able to provide you with the input files as it is ongoing research. I really appreciate you looking into it. De: Avraam Tapinos @.> Fecha: viernes, 1 de noviembre de 2024, 8:28 p. m. Para: Wedge-lab/dpclust @.> CC: ses101-24 @.>, Mention @.> Asunto: Re: [Wedge-lab/dpclust] error with snvs being removed and not assigned a cluster (Issue #30)
Thanks for the ticket, @ses101-24https://github.com/ses101-24.
We'll review the function you mentioned for any issues. Could you please provide the input files you used for the run so we can examine it in more detail?
— Reply to this email directly, view it on GitHubhttps://github.com/Wedge-lab/dpclust/issues/30#issuecomment-2451656573, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BLMCFVBQ6ILWM5CZS6IOG4DZ6NJUBAVCNFSM6AAAAABPCF6LD6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJRGY2TMNJXGM. You are receiving this because you were mentioned.Message ID: @.***>
I have run dpclust with pairs of whole genome samples for each patient, comparing the clones between a primary tumour and a metastatic tumour for each patient. The clustering works for all the patients, and I also have clustering plots showing the clonal clusters, but I am having trouble with snvs being filtered out at the beginning that are in one sample but not present in the other at all. Sometimes this is thousands of variants, including variants in some of the driver genes that I am particularly interested in. Then they are not assigned a cluster. It seems to be because no.chrs.bearing.mut is zero in the input file for one of the samples, but then it removes that snv completely for both samples, as the if condition checks both files.
(in from load.data.inner function is "Removed xxx with missing totalCopyNumber").
I followed the instructions in the documentation with input vcf files and copy number segment data, so I have one allDirichletProcessInfo input file for each sample in the pair, with the same loci in each (so both have the same number of lines).
But I would think that is expected behaviour, that some variants are only present in one sample, and they should be input to the clustering algorithm and assigned to a cluster?
There is also some code where it says that these removed snvs are added in afterwards, but I think there is a code error here.
writeStandardFinalOutput, it adds the removed snps back into the output, by calling the function add_removed_snvs.
Add the removed mutations back in output = cbind(dataset$chromosome[,1], dataset$position[,1]-1, dataset$position[,1], clustering$best.node.assignments, clustering$best.assignment.likelihoods) output = add_removed_snvs(dataset, output),
But this function appears to have an error when sorting the variants by chromosome position afterwards. The line with match, actually removes the snps just added rather than sorting them.
Sort the output in the same order as the dataset chrposinput = paste(dataset$chromosome, dataset$position, sep="") chrpos_output = paste(snv_assignment_table[,1], snv_assignmenttable[,3], sep="") snv_assignment_table = snv_assignment_table[match(chrpos_input, chrpos_output),] return(snv_assignment_table)
In the comment, it also says that it assigns these variants a cluster, but I couldn't find a place in the code that does that.
Please correct me if I have misunderstood any of the above.
Although not part of the above issue, I would be very appreciative if you could explain the reasons for CCF of above 1, and how to determine whether it is necessary to do any filtering either before or afterwards to remove any noise. (I tried both without any filtering of variants beforehand, and also with removing any snvs beforehand that didn't have a VAF of 5% in either sample).
Thanks.