davidebolo1993 / VISOR

VarIant SimulatOR for short, long and linked reads
GNU Lesser General Public License v3.0
41 stars 11 forks source link

calculate ground truth copy number profile from HACk.bed #30

Closed Jiayi-Wang-Joey closed 9 months ago

Jiayi-Wang-Joey commented 1 year ago

Dear Davide,

Thanks for your great work of VISOR! We used visor to simulate WGS data in bam format to test if our CNV detection method works good. Now we want to predict the CNV prediction accuracy. Basically, we want to compare the copy number in each position and see if they match. However, I am not sure how to calculate the ground truth copy number profile from HACk.bed where we saved the CNVs. It includes information on CNV types, but no information on copy numbers. I wonder how can we calculate the copy number profile of the simulated data? Thank you very much!

Kind regards, Jiayi Wang

davidebolo1993 commented 1 year ago

Hey @Jiayi-Wang-Joey,

sorry for the delay, I'm travelling these days. I assume copy numbers here are deletions and tandem duplications. For tandem duplications, you have how many times a segment is duplicated in the hack.bed - this is required at the time of variant generation. For deletions, if you simulated het SVs, you can assume copy number is 1 (I'm assuming human as a reference here). Also, I have a script to convert .bed to .vcf for HACk here. You may want to work on this to adjust for your needs.

Let me know if you need further help,

Davide

Jiayi-Wang-Joey commented 1 year ago

Thanks Davide! For me, it is like deletions, insertons, tandem duplications and tandem repetition will all influence the copy number. For tandem duplications, if the 5th column (SV info) says 2, does it mean gain 2 copy numbers? But how do you say the copy number of insertion? Thanks again!

Jiayi