[Question] generating paired tumor-control samples

davidebolo1993 / VISOR

VarIant SimulatOR for short, long and linked reads

GNU Lesser General Public License v3.0

41 stars 11 forks source link

[Question] generating paired tumor-control samples #38

Closed waltergallegog closed 4 months ago

waltergallegog commented 4 months ago

Hello. I'm interested in the evaluation of somatic SV callers, and due to the lack of benchmarks, I'm planning on using a simulator to generate the test data. I was wondering if you have considered the use case of generating a tumor-control pair, in which the tumor sample contains additional SVs to those in the control, or if you have any hints/suggestions on how to use VISOR to do so. Thanks and BR, Walter.

davidebolo1993 commented 4 months ago

Hi @waltergallegog,

VISOR can definitely help with this. Ideally, you can simulate a sample with some SVs and add some others in a second sample using HACk. With this, you will end up with a couple of folders, one with your control haplotypes and one with your tumor haplotypes. You can then simulate reads from those haplotypes with SHORtS/LASeR (short and long reads respectively). You can check out the documentation for some examples. SHORtS and LASeR offer some control over tumor purity which is often found to occur in true-to-life samples.

Hope this helps,

Davide

waltergallegog commented 4 months ago

Thanks for the quick and very helpful answer. I will check the suggested documentation.

davidebolo1993 commented 4 months ago

Also @waltergallegog, a couple of real tumor-control datasets I've worked with in the past:

COLO829, tumor COLO829, normal H2009, tumor H2009, normal HCC1954, tumor HCC1954, normal

waltergallegog commented 4 months ago

Thanks for the datasets. Have you also worked with somatic SV truth sets by any chance? I know of the truth sets in the Espejo Valle-Inclan and Arora benchmarks, and the recent truth set by Paulin et al.