brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
MIT License
262 stars 35 forks source link

How to specify tumor/normal pairs #69

Closed V-Catherine closed 3 years ago

V-Catherine commented 3 years ago

Hi, thanks a lot for sharing this nice tool! I've a question regarding the format of the ped file and more precisely how to specify tumor/normal relation ? I tested putting the sample ID of the normal as family ID of the tumor but it did not seem to take this into account and I ended seeing only unrelated points. Thanks for your help

brentp commented 3 years ago

Hi, you can do this using a groups file. It would look like:

tumor1,normal1
tumor2,normal2

then use with the -g argument to somalier relate.

V-Catherine commented 3 years ago

Thanks a lot :-)

spula07 commented 1 year ago

Hi, I'm totally new to this tool and don't get how exactly looks like that groups file. Can someone explain this to me? For example I have few samples in FAM file:

IID FID PAT MAT SEX PHENOTYPE
sample1 sample1 0 0 2 normal
sample2 sample2 0 0 1 tumor
sample3 sample3 0 0 1 tumor
sample4 sample4 0 0 1 normal
sample5 sample5 0 0 2 tumor

How Can I rearrange this to create groups file?

brentp commented 1 year ago

Hi, how are sample1-5 paired? I can't tell from the naming. If you give an example where that's the case, I can help you make the groups file. If all of your samples are unrelated (you don't have tumor-normal pairs or families), then you don't need to specify a groups or ped file.

spula07 commented 1 year ago

So that's only for related pairs? I thought that I can take a list of unrelated samples and then annotate them by normal-tumor groups and find something like correlation between genotype and phenotype.

brentp commented 1 year ago

Yes, you can do that. BUt I can't tell from your sample names how the tumor-normals are paired. Let's say in your example above, samples1-3 are a set of tumor normal groups from the same person and likewise 4-5 are from another person. Then your groups file would look like this:

sample1,sample2,sample3
sample4,sample5
spula07 commented 1 year ago

Oh, ok. For you sample1, sample2 and sample3 is from one person. For me samples 1-5 are 5 different people

brentp commented 1 year ago

Then you don't need a groups file. And if the 5 samples are unrelated, you don't need a pedigree file. You actually don't need one either way, but it helps to see if the observed matches the expected when you have an expected.

spula07 commented 1 year ago

Ok, thank you very much. That's really helpful.