bioinform / neusomatic

NeuSomatic: Deep convolutional neural networks for accurate somatic mutation detection
Other
168 stars 51 forks source link

How to produce synthetic training data with different tumor purity and normal contamination ratio? #69

Closed ZekunYin closed 3 years ago

ZekunYin commented 3 years ago

Hi, Wonderful work! I think this is one of the most impressive works in the field of somatic mutation calling. I'm very interested in it. And I'd like to reproduce some evaluation results mentioned in your paper. But now I've problem in producing synthetic training data with different tumor purity and normal contamination ratio. It seems that this link didn't describe the details about producing training data with different tumor purity. I hope you can help me solve this problem.

Best, Zekun

msahraeian commented 3 years ago

@ZekunYin Thanks for your interest in NeuSomatic. This link shows how you can spike mutations in a normal bam to generate a synthetic tumor bam with desired AFs. Once you have the simulated tumor, you can mix tumor and normal bams with different rations to generate mixed tumors/normals with different purities. For instance, using picard DownsampleSam, you can downsample tumor to 70% and normal to 30% and mix them with picard mergesamfiles to generate 70% pure tumors.

ZekunYin commented 3 years ago

That's very helpful! Thanks so much!