bioinfo-ut / GeneToCN

Gene copy number prediction from k-mer frequencies
GNU General Public License v3.0
9 stars 1 forks source link

Possible to use exomes? #5

Open vinjlynch opened 2 months ago

vinjlynch commented 2 months ago

Hi,

I'm curious if it is possible to use fastq files with reads from whole exome target-capture raw sequencing?

Thanks Vinny

fannydhelia commented 1 month ago

Thank you for the question! Since GeneToCN is optimized for and validated on WGS data, it assumes a uniform coverage in the gene region and a single copy reference region to be able to compare the k-mer frequencies and estimate an accurate copy number. Considering the coverage variability and potential biases from the target-capture process, the copy number results may not be reliable using whole exome target-capture sequencing data (even if there are suitable single-copy reference regions sequenced, which might not always be the case, I would think). Unfortunately I don't have experience using this kind of data myself, so I am not familiar with typical coverage profiles. It would be interesting to look into it in the future to see if it would be possible to estimate copy numbers reliably.